4: Lmod has detected the following error: The following module(s) are unknown: 4: "suse-repo-deps/sam-default" 4: 4: Please check the spelling or version number. Also try "module spider ..." 4: It is also possible your cache file is out-of-date; it may help to try: 4: $ module --ignore-cache load "suse-repo-deps/sam-default" 4: 4: Also make sure that all modulefiles written in TCL start with the string 4: #%Module 4: 4: 4: 0: Lmod has detected the following error: The following module(s) are unknown: 0: "suse-repo-deps/sam-default" 0: 0: Please check the spelling or version number. Also try "module spider ..." 0: It is also possible your cache file is out-of-date; it may help to try: 0: $ module --ignore-cache load "suse-repo-deps/sam-default" 0: 0: Also make sure that all modulefiles written in TCL start with the string 0: #%Module 0: 0: 0: 4: Lmod has detected the following error: The following module(s) are unknown: 4: "rocm/sam-5.2.3" 4: 4: Please check the spelling or version number. Also try "module spider ..." 4: It is also possible your cache file is out-of-date; it may help to try: 4: $ module --ignore-cache load "rocm/sam-5.2.3" 4: 4: Also make sure that all modulefiles written in TCL start with the string 4: #%Module 4: 4: 4: 0: Lmod has detected the following error: The following module(s) are unknown: 0: "rocm/sam-5.2.3" 0: 0: Please check the spelling or version number. Also try "module spider ..." 0: It is also possible your cache file is out-of-date; it may help to try: 0: $ module --ignore-cache load "rocm/sam-5.2.3" 0: 0: Also make sure that all modulefiles written in TCL start with the string 0: #%Module 0: 0: 0: 4: Lmod has detected the following error: The following module(s) are unknown: 4: "rccl/sam-develop" 4: 4: Please check the spelling or version number. Also try "module spider ..." 4: It is also possible your cache file is out-of-date; it may help to try: 4: $ module --ignore-cache load "rccl/sam-develop" 4: 4: Also make sure that all modulefiles written in TCL start with the string 4: #%Module 4: 4: 4: 0: Lmod has detected the following error: The following module(s) are unknown: 0: "rccl/sam-develop" 0: 0: Please check the spelling or version number. Also try "module spider ..." 0: It is also possible your cache file is out-of-date; it may help to try: 0: $ module --ignore-cache load "rccl/sam-develop" 0: 0: Also make sure that all modulefiles written in TCL start with the string 0: #%Module 0: 0: 0: 4: Lmod has detected the following error: The following module(s) are unknown: 4: "aws-ofi-rccl/sam-default" 4: 4: Please check the spelling or version number. Also try "module spider ..." 4: It is also possible your cache file is out-of-date; it may help to try: 4: $ module --ignore-cache load "aws-ofi-rccl/sam-default" 4: 4: Also make sure that all modulefiles written in TCL start with the string 4: #%Module 4: 4: 4: 0: Lmod has detected the following error: The following module(s) are unknown: 0: "aws-ofi-rccl/sam-default" 0: 0: Please check the spelling or version number. Also try "module spider ..." 0: It is also possible your cache file is out-of-date; it may help to try: 0: $ module --ignore-cache load "aws-ofi-rccl/sam-default" 0: 0: Also make sure that all modulefiles written in TCL start with the string 0: #%Module 0: 0: 0: 1: Lmod has detected the following error: The following module(s) are unknown: 1: "suse-repo-deps/sam-default" 1: 1: Please check the spelling or version number. Also try "module spider ..." 1: It is also possible your cache file is out-of-date; it may help to try: 1: $ module --ignore-cache load "suse-repo-deps/sam-default" 1: 1: Also make sure that all modulefiles written in TCL start with the string 1: #%Module 1: 1: 1: 5: Lmod has detected the following error: The following module(s) are unknown: 5: "suse-repo-deps/sam-default" 5: 5: Please check the spelling or version number. Also try "module spider ..." 5: It is also possible your cache file is out-of-date; it may help to try: 5: $ module --ignore-cache load "suse-repo-deps/sam-default" 5: 5: Also make sure that all modulefiles written in TCL start with the string 5: #%Module 5: 5: 5: 1: Lmod has detected the following error: The following module(s) are unknown: 1: "rocm/sam-5.2.3" 1: 1: Please check the spelling or version number. Also try "module spider ..." 1: It is also possible your cache file is out-of-date; it may help to try: 1: $ module --ignore-cache load "rocm/sam-5.2.3" 1: 1: Also make sure that all modulefiles written in TCL start with the string 1: #%Module 1: 1: 1: 5: Lmod has detected the following error: The following module(s) are unknown: 5: "rocm/sam-5.2.3" 5: 5: Please check the spelling or version number. Also try "module spider ..." 5: It is also possible your cache file is out-of-date; it may help to try: 5: $ module --ignore-cache load "rocm/sam-5.2.3" 5: 5: Also make sure that all modulefiles written in TCL start with the string 5: #%Module 5: 5: 5: 1: Lmod has detected the following error: The following module(s) are unknown: 1: "rccl/sam-develop" 1: 1: Please check the spelling or version number. Also try "module spider ..." 1: It is also possible your cache file is out-of-date; it may help to try: 1: $ module --ignore-cache load "rccl/sam-develop" 1: 1: Also make sure that all modulefiles written in TCL start with the string 1: #%Module 1: 1: 1: 5: Lmod has detected the following error: The following module(s) are unknown: 5: "rccl/sam-develop" 5: 5: Please check the spelling or version number. Also try "module spider ..." 5: It is also possible your cache file is out-of-date; it may help to try: 5: $ module --ignore-cache load "rccl/sam-develop" 5: 5: Also make sure that all modulefiles written in TCL start with the string 5: #%Module 5: 5: 5: 1: Lmod has detected the following error: The following module(s) are unknown: 1: "aws-ofi-rccl/sam-default" 1: 1: Please check the spelling or version number. Also try "module spider ..." 1: It is also possible your cache file is out-of-date; it may help to try: 1: $ module --ignore-cache load "aws-ofi-rccl/sam-default" 1: 1: Also make sure that all modulefiles written in TCL start with the string 1: #%Module 1: 1: 1: 5: Lmod has detected the following error: The following module(s) are unknown: 5: "aws-ofi-rccl/sam-default" 5: 5: Please check the spelling or version number. Also try "module spider ..." 5: It is also possible your cache file is out-of-date; it may help to try: 5: $ module --ignore-cache load "aws-ofi-rccl/sam-default" 5: 5: Also make sure that all modulefiles written in TCL start with the string 5: #%Module 5: 5: 5: 7: Lmod has detected the following error: The following module(s) are unknown: 7: "suse-repo-deps/sam-default" 7: 7: Please check the spelling or version number. Also try "module spider ..." 7: It is also possible your cache file is out-of-date; it may help to try: 7: $ module --ignore-cache load "suse-repo-deps/sam-default" 7: 7: Also make sure that all modulefiles written in TCL start with the string 7: #%Module 7: 7: 7: 7: Lmod has detected the following error: The following module(s) are unknown: 7: "rocm/sam-5.2.3" 7: 7: Please check the spelling or version number. Also try "module spider ..." 7: It is also possible your cache file is out-of-date; it may help to try: 7: $ module --ignore-cache load "rocm/sam-5.2.3" 7: 7: Also make sure that all modulefiles written in TCL start with the string 7: #%Module 7: 7: 7: 7: Lmod has detected the following error: The following module(s) are unknown: 7: "rccl/sam-develop" 7: 7: Please check the spelling or version number. Also try "module spider ..." 7: It is also possible your cache file is out-of-date; it may help to try: 7: $ module --ignore-cache load "rccl/sam-develop" 7: 7: Also make sure that all modulefiles written in TCL start with the string 7: #%Module 7: 7: 7: 6: Lmod has detected the following error: The following module(s) are unknown: 6: "suse-repo-deps/sam-default" 6: 6: Please check the spelling or version number. Also try "module spider ..." 6: It is also possible your cache file is out-of-date; it may help to try: 6: $ module --ignore-cache load "suse-repo-deps/sam-default" 6: 6: Also make sure that all modulefiles written in TCL start with the string 6: #%Module 6: 6: 6: 6: Lmod has detected the following error: The following module(s) are unknown: 6: "rocm/sam-5.2.3" 6: 6: Please check the spelling or version number. Also try "module spider ..." 6: It is also possible your cache file is out-of-date; it may help to try: 6: $ module --ignore-cache load "rocm/sam-5.2.3" 6: 6: Also make sure that all modulefiles written in TCL start with the string 6: #%Module 6: 6: 6: 6: Lmod has detected the following error: The following module(s) are unknown: 6: "rccl/sam-develop" 6: 6: Please check the spelling or version number. Also try "module spider ..." 6: It is also possible your cache file is out-of-date; it may help to try: 6: $ module --ignore-cache load "rccl/sam-develop" 6: 6: Also make sure that all modulefiles written in TCL start with the string 6: #%Module 6: 6: 6: 3: Lmod has detected the following error: The following module(s) are unknown: 3: "suse-repo-deps/sam-default" 3: 3: Please check the spelling or version number. Also try "module spider ..." 3: It is also possible your cache file is out-of-date; it may help to try: 3: $ module --ignore-cache load "suse-repo-deps/sam-default" 3: 3: Also make sure that all modulefiles written in TCL start with the string 3: #%Module 3: 3: 3: 6: Lmod has detected the following error: The following module(s) are unknown: 6: "aws-ofi-rccl/sam-default" 6: 6: Please check the spelling or version number. Also try "module spider ..." 6: It is also possible your cache file is out-of-date; it may help to try: 6: $ module --ignore-cache load "aws-ofi-rccl/sam-default" 6: 6: Also make sure that all modulefiles written in TCL start with the string 6: #%Module 6: 6: 6: 3: Lmod has detected the following error: The following module(s) are unknown: 3: "rocm/sam-5.2.3" 3: 3: Please check the spelling or version number. Also try "module spider ..." 3: It is also possible your cache file is out-of-date; it may help to try: 3: $ module --ignore-cache load "rocm/sam-5.2.3" 3: 3: Also make sure that all modulefiles written in TCL start with the string 3: #%Module 3: 3: 3: 3: Lmod has detected the following error: The following module(s) are unknown: 3: "rccl/sam-develop" 3: 3: Please check the spelling or version number. Also try "module spider ..." 3: It is also possible your cache file is out-of-date; it may help to try: 3: $ module --ignore-cache load "rccl/sam-develop" 3: 3: Also make sure that all modulefiles written in TCL start with the string 3: #%Module 3: 3: 3: 3: Lmod has detected the following error: The following module(s) are unknown: 3: "aws-ofi-rccl/sam-default" 3: 3: Please check the spelling or version number. Also try "module spider ..." 3: It is also possible your cache file is out-of-date; it may help to try: 3: $ module --ignore-cache load "aws-ofi-rccl/sam-default" 3: 3: Also make sure that all modulefiles written in TCL start with the string 3: #%Module 3: 3: 3: 2: Lmod has detected the following error: The following module(s) are unknown: 2: "suse-repo-deps/sam-default" 2: 2: Please check the spelling or version number. Also try "module spider ..." 2: It is also possible your cache file is out-of-date; it may help to try: 2: $ module --ignore-cache load "suse-repo-deps/sam-default" 2: 2: Also make sure that all modulefiles written in TCL start with the string 2: #%Module 2: 2: 2: 2: Lmod has detected the following error: The following module(s) are unknown: 2: "rocm/sam-5.2.3" 2: 2: Please check the spelling or version number. Also try "module spider ..." 2: It is also possible your cache file is out-of-date; it may help to try: 2: $ module --ignore-cache load "rocm/sam-5.2.3" 2: 2: Also make sure that all modulefiles written in TCL start with the string 2: #%Module 2: 2: 2: 2: Lmod has detected the following error: The following module(s) are unknown: 2: "rccl/sam-develop" 2: 2: Please check the spelling or version number. Also try "module spider ..." 2: It is also possible your cache file is out-of-date; it may help to try: 2: $ module --ignore-cache load "rccl/sam-develop" 2: 2: Also make sure that all modulefiles written in TCL start with the string 2: #%Module 2: 2: 2: 2: Lmod has detected the following error: The following module(s) are unknown: 2: "aws-ofi-rccl/sam-default" 2: 2: Please check the spelling or version number. Also try "module spider ..." 2: It is also possible your cache file is out-of-date; it may help to try: 2: $ module --ignore-cache load "aws-ofi-rccl/sam-default" 2: 2: Also make sure that all modulefiles written in TCL start with the string 2: #%Module 2: 2: 2: 7: Lmod has detected the following error: The following module(s) are unknown: 7: "aws-ofi-rccl/sam-default" 7: 7: Please check the spelling or version number. Also try "module spider ..." 7: It is also possible your cache file is out-of-date; it may help to try: 7: $ module --ignore-cache load "aws-ofi-rccl/sam-default" 7: 7: Also make sure that all modulefiles written in TCL start with the string 7: #%Module 7: 7: 7: 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: 2023-04-24 12:06:52.057328: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2: 2023-04-24 12:06:52.057375: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2: 2023-04-24 12:06:52.057402: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2: 2023-04-24 12:06:52.057411: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2: 2023-04-24 12:06:52.057370: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2: 2023-04-24 12:06:52.057424: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2: 2023-04-24 12:06:52.057458: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2: 2023-04-24 12:06:52.057484: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 3: 2023-04-24 12:06:52.057756: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 3: 2023-04-24 12:06:52.057808: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 4: 2023-04-24 12:06:52.057915: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 4: 2023-04-24 12:06:52.057932: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 4: 2023-04-24 12:06:52.057970: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 6: 2023-04-24 12:06:52.057958: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 6: 2023-04-24 12:06:52.058003: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 3: 2023-04-24 12:06:52.057799: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 3: 2023-04-24 12:06:52.057851: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 3: 2023-04-24 12:06:52.057860: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 4: 2023-04-24 12:06:52.057973: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 5: 2023-04-24 12:06:52.057848: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 5: 2023-04-24 12:06:52.057860: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 5: 2023-04-24 12:06:52.057883: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 3: 2023-04-24 12:06:52.057868: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 3: 2023-04-24 12:06:52.057880: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 4: 2023-04-24 12:06:52.058000: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 4: 2023-04-24 12:06:52.058002: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 4: 2023-04-24 12:06:52.058016: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 5: 2023-04-24 12:06:52.057904: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 5: 2023-04-24 12:06:52.057922: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 3: 2023-04-24 12:06:52.057908: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 5: 2023-04-24 12:06:52.057933: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 5: 2023-04-24 12:06:52.057952: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 4: 2023-04-24 12:06:52.058149: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 5: 2023-04-24 12:06:52.058068: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 6: 2023-04-24 12:06:52.058035: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 6: 2023-04-24 12:06:52.057999: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 6: 2023-04-24 12:06:52.058064: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 6: 2023-04-24 12:06:52.058076: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 6: 2023-04-24 12:06:52.058047: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 6: 2023-04-24 12:06:52.058167: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 7: 2023-04-24 12:06:52.058760: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 7: 2023-04-24 12:06:52.058792: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 7: 2023-04-24 12:06:52.058796: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 7: 2023-04-24 12:06:52.058818: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 7: 2023-04-24 12:06:52.058818: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 7: 2023-04-24 12:06:52.058822: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 0: 2023-04-24 12:06:52.058787: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 0: 2023-04-24 12:06:52.058798: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 0: 2023-04-24 12:06:52.058806: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 7: 2023-04-24 12:06:52.058831: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 7: 2023-04-24 12:06:52.058838: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 0: 2023-04-24 12:06:52.058816: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 0: 2023-04-24 12:06:52.058824: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 0: 2023-04-24 12:06:52.058826: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 0: 2023-04-24 12:06:52.058836: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 0: 2023-04-24 12:06:52.058840: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 1: 2023-04-24 12:06:52.059041: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 1: 2023-04-24 12:06:52.059057: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 1: 2023-04-24 12:06:52.059049: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 1: 2023-04-24 12:06:52.059056: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 1: 2023-04-24 12:06:52.059046: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 1: 2023-04-24 12:06:52.059073: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 1: 2023-04-24 12:06:52.059070: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 1: 2023-04-24 12:06:52.059101: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA 1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 3: 2023-04-24 12:07:11.205858: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:11.205894: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:11.205936: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:11.205951: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:11.205980: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:11.205968: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:11.205932: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:11.206219: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:11.206235: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:11.205996: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:11.218934: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 0: 2023-04-24 12:07:11.206199: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 0: 2023-04-24 12:07:11.206234: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:11.218976: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 0: 2023-04-24 12:07:11.206257: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 7: 2023-04-24 12:07:11.206304: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 7: 2023-04-24 12:07:11.206349: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:11.218987: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 0: 2023-04-24 12:07:11.206276: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 0: 2023-04-24 12:07:11.206301: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:11.206265: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:11.206285: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:11.219006: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 4: 2023-04-24 12:07:11.206423: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 4: 2023-04-24 12:07:11.206450: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 5: 2023-04-24 12:07:11.206195: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 5: 2023-04-24 12:07:11.206231: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 6: 2023-04-24 12:07:11.206323: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 6: 2023-04-24 12:07:11.206353: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 2: 2023-04-24 12:07:11.206151: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 2: 2023-04-24 12:07:11.206179: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 0: 2023-04-24 12:07:11.206337: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:11.206280: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:11.206287: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:11.219016: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2: 2023-04-24 12:07:11.206199: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 0: 2023-04-24 12:07:11.206343: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 0: 2023-04-24 12:07:11.206316: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:11.206301: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:11.219055: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 3: 2023-04-24 12:07:11.219059: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 3: 2023-04-24 12:07:11.219061: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2: 2023-04-24 12:07:11.206225: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 2: 2023-04-24 12:07:11.206234: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:11.206307: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 4: 2023-04-24 12:07:11.206467: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 4: 2023-04-24 12:07:11.206514: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 5: 2023-04-24 12:07:11.206256: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 5: 2023-04-24 12:07:11.206274: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 6: 2023-04-24 12:07:11.206378: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 2: 2023-04-24 12:07:11.206248: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 7: 2023-04-24 12:07:11.206433: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 4: 2023-04-24 12:07:11.206535: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 5: 2023-04-24 12:07:11.206306: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 5: 2023-04-24 12:07:11.206320: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 6: 2023-04-24 12:07:11.206403: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 6: 2023-04-24 12:07:11.206424: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 2: 2023-04-24 12:07:11.206259: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 2: 2023-04-24 12:07:11.206267: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 7: 2023-04-24 12:07:11.206429: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 7: 2023-04-24 12:07:11.206431: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 4: 2023-04-24 12:07:11.206550: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 4: 2023-04-24 12:07:11.206551: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 5: 2023-04-24 12:07:11.206332: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 5: 2023-04-24 12:07:11.206325: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 6: 2023-04-24 12:07:11.206446: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 6: 2023-04-24 12:07:11.206441: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 7: 2023-04-24 12:07:11.206378: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 7: 2023-04-24 12:07:11.206450: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 4: 2023-04-24 12:07:11.206579: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 6: 2023-04-24 12:07:11.206450: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 7: 2023-04-24 12:07:11.206470: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:11.219366: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 0: 2023-04-24 12:07:11.219237: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 1: 2023-04-24 12:07:11.219394: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 5: 2023-04-24 12:07:11.219273: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 0: 2023-04-24 12:07:11.219271: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 1: 2023-04-24 12:07:11.219413: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 1: 2023-04-24 12:07:11.219422: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2: 2023-04-24 12:07:11.219307: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2: 2023-04-24 12:07:11.219338: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 0: 2023-04-24 12:07:11.219291: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 0: 2023-04-24 12:07:11.219328: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 0: 2023-04-24 12:07:11.219341: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 1: 2023-04-24 12:07:11.219437: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 4: 2023-04-24 12:07:11.219463: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 5: 2023-04-24 12:07:11.219309: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 5: 2023-04-24 12:07:11.219335: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 0: 2023-04-24 12:07:11.219353: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 0: 2023-04-24 12:07:11.219348: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 0: 2023-04-24 12:07:11.219375: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 1: 2023-04-24 12:07:11.219450: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 1: 2023-04-24 12:07:11.219457: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 4: 2023-04-24 12:07:11.219499: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 5: 2023-04-24 12:07:11.219354: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2: 2023-04-24 12:07:11.219360: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 1: 2023-04-24 12:07:11.219470: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 7: 2023-04-24 12:07:11.219393: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 4: 2023-04-24 12:07:11.219527: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 5: 2023-04-24 12:07:11.219378: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 5: 2023-04-24 12:07:11.219398: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2: 2023-04-24 12:07:11.219390: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 4: 2023-04-24 12:07:11.219551: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 4: 2023-04-24 12:07:11.219560: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 5: 2023-04-24 12:07:11.219420: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 5: 2023-04-24 12:07:11.219431: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 6: 2023-04-24 12:07:11.219469: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 6: 2023-04-24 12:07:11.219500: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2: 2023-04-24 12:07:11.219404: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2: 2023-04-24 12:07:11.219409: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 4: 2023-04-24 12:07:11.219567: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 6: 2023-04-24 12:07:11.219522: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2: 2023-04-24 12:07:11.219427: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2: 2023-04-24 12:07:11.219433: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 4: 2023-04-24 12:07:11.219602: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 6: 2023-04-24 12:07:11.219556: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 4: 2023-04-24 12:07:11.219608: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 6: 2023-04-24 12:07:11.219576: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 6: 2023-04-24 12:07:11.219598: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 6: 2023-04-24 12:07:11.219599: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 6: 2023-04-24 12:07:11.219632: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 7: 2023-04-24 12:07:11.219428: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 7: 2023-04-24 12:07:11.219452: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 7: 2023-04-24 12:07:11.219472: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 7: 2023-04-24 12:07:11.219488: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 7: 2023-04-24 12:07:11.219502: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 7: 2023-04-24 12:07:11.219509: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 7: 2023-04-24 12:07:11.219515: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 1: 2023-04-24 12:07:46.584337: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:46.584357: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:46.584363: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:46.584374: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 0: 2023-04-24 12:07:46.584567: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 0: 2023-04-24 12:07:46.584585: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:46.584380: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:46.584396: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 7: 2023-04-24 12:07:46.584652: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 7: 2023-04-24 12:07:46.584670: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 5: 2023-04-24 12:07:46.584620: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 5: 2023-04-24 12:07:46.584643: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 0: 2023-04-24 12:07:46.584611: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 0: 2023-04-24 12:07:46.584622: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:46.584392: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:46.584395: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 7: 2023-04-24 12:07:46.584693: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 7: 2023-04-24 12:07:46.584705: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 5: 2023-04-24 12:07:46.584653: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 0: 2023-04-24 12:07:46.584636: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 0: 2023-04-24 12:07:46.584639: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:46.585509: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:46.585509: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 7: 2023-04-24 12:07:46.584716: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 5: 2023-04-24 12:07:46.584681: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 5: 2023-04-24 12:07:46.584687: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 0: 2023-04-24 12:07:46.584644: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 0: 2023-04-24 12:07:46.584644: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:46.585511: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:46.585513: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 7: 2023-04-24 12:07:46.584722: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 7: 2023-04-24 12:07:46.584722: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 5: 2023-04-24 12:07:46.584693: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 5: 2023-04-24 12:07:46.584703: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 0: 2023-04-24 12:07:46.585610: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 0: 2023-04-24 12:07:46.585610: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:46.585513: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:46.585516: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 7: 2023-04-24 12:07:46.584727: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 5: 2023-04-24 12:07:46.584712: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 0: 2023-04-24 12:07:46.585613: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 0: 2023-04-24 12:07:46.585617: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:46.585521: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:46.585527: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 1: 2023-04-24 12:07:46.585528: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 0: 2023-04-24 12:07:46.585619: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 0: 2023-04-24 12:07:46.585636: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 1: 2023-04-24 12:07:46.585528: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 0: 2023-04-24 12:07:46.585636: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 0: 2023-04-24 12:07:46.585637: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 0: 2023-04-24 12:07:46.585638: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 1: 2023-04-24 12:07:46.585529: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 1: 2023-04-24 12:07:46.585530: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 1: 2023-04-24 12:07:46.585530: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 0: 2023-04-24 12:07:46.585637: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 0: 2023-04-24 12:07:46.585664: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:46.585534: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 1: 2023-04-24 12:07:46.585536: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 1: 2023-04-24 12:07:46.585549: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 0: 2023-04-24 12:07:46.585688: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 0: 2023-04-24 12:07:46.585674: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 7: 2023-04-24 12:07:46.585862: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 7: 2023-04-24 12:07:46.585863: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 0: 2023-04-24 12:07:46.585690: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 0: 2023-04-24 12:07:46.585702: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 0: 2023-04-24 12:07:46.585713: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 7: 2023-04-24 12:07:46.585867: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 7: 2023-04-24 12:07:46.585869: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 7: 2023-04-24 12:07:46.585869: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 7: 2023-04-24 12:07:46.585884: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 7: 2023-04-24 12:07:46.585885: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 7: 2023-04-24 12:07:46.585886: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 7: 2023-04-24 12:07:46.585886: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 7: 2023-04-24 12:07:46.585885: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 7: 2023-04-24 12:07:46.585966: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 7: 2023-04-24 12:07:46.585977: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 7: 2023-04-24 12:07:46.585986: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 7: 2023-04-24 12:07:46.585994: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 7: 2023-04-24 12:07:46.585987: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 7: 2023-04-24 12:07:46.586006: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 5: 2023-04-24 12:07:46.585906: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 5: 2023-04-24 12:07:46.585905: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 5: 2023-04-24 12:07:46.585908: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 5: 2023-04-24 12:07:46.585908: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 5: 2023-04-24 12:07:46.585909: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 5: 2023-04-24 12:07:46.585911: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 5: 2023-04-24 12:07:46.585931: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 5: 2023-04-24 12:07:46.585931: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 5: 2023-04-24 12:07:46.585931: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 5: 2023-04-24 12:07:46.585932: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 5: 2023-04-24 12:07:46.585932: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 5: 2023-04-24 12:07:46.585932: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 5: 2023-04-24 12:07:46.586043: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 5: 2023-04-24 12:07:46.586049: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 5: 2023-04-24 12:07:46.586063: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 5: 2023-04-24 12:07:46.586066: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 4: 2023-04-24 12:07:46.591246: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 4: 2023-04-24 12:07:46.591309: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 4: 2023-04-24 12:07:46.591310: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 4: 2023-04-24 12:07:46.591321: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 4: 2023-04-24 12:07:46.591339: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 4: 2023-04-24 12:07:46.591361: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 4: 2023-04-24 12:07:46.591366: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 4: 2023-04-24 12:07:46.591445: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 4: 2023-04-24 12:07:46.592570: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 4: 2023-04-24 12:07:46.592571: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 4: 2023-04-24 12:07:46.592572: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 4: 2023-04-24 12:07:46.592575: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 4: 2023-04-24 12:07:46.592578: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 4: 2023-04-24 12:07:46.592578: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 4: 2023-04-24 12:07:46.592580: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 4: 2023-04-24 12:07:46.592581: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 4: 2023-04-24 12:07:46.592603: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 4: 2023-04-24 12:07:46.592603: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 4: 2023-04-24 12:07:46.592603: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 4: 2023-04-24 12:07:46.592605: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 4: 2023-04-24 12:07:46.592605: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 4: 2023-04-24 12:07:46.592605: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 4: 2023-04-24 12:07:46.592606: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 4: 2023-04-24 12:07:46.592604: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 3: 2023-04-24 12:07:46.633248: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:46.633292: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:46.633330: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:46.633338: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:46.633350: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:46.633375: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:46.633613: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:46.633616: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:46.634425: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:46.634443: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 3: 2023-04-24 12:07:46.634452: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:46.634458: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:46.634462: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:46.634465: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:46.634470: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:46.634493: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 3: 2023-04-24 12:07:46.634497: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 3: 2023-04-24 12:07:46.634499: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 3: 2023-04-24 12:07:46.634501: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 3: 2023-04-24 12:07:46.634502: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 3: 2023-04-24 12:07:46.634539: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:46.634544: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 3: 2023-04-24 12:07:46.634558: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 3: 2023-04-24 12:07:46.634561: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 2: 2023-04-24 12:07:46.634569: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 2: 2023-04-24 12:07:46.634597: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 2: 2023-04-24 12:07:46.634613: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 2: 2023-04-24 12:07:46.634632: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 2: 2023-04-24 12:07:46.634638: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 2: 2023-04-24 12:07:46.634646: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 2: 2023-04-24 12:07:46.634790: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 2: 2023-04-24 12:07:46.634792: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 6: 2023-04-24 12:07:46.635259: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 6: 2023-04-24 12:07:46.635298: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 6: 2023-04-24 12:07:46.635341: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 6: 2023-04-24 12:07:46.635354: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 6: 2023-04-24 12:07:46.635374: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 6: 2023-04-24 12:07:46.635381: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 6: 2023-04-24 12:07:46.635618: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 6: 2023-04-24 12:07:46.635621: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 2: 2023-04-24 12:07:46.635557: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 2: 2023-04-24 12:07:46.635571: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 2: 2023-04-24 12:07:46.635572: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 2: 2023-04-24 12:07:46.635574: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 2: 2023-04-24 12:07:46.635574: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 2: 2023-04-24 12:07:46.635575: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 2: 2023-04-24 12:07:46.635577: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 2: 2023-04-24 12:07:46.635579: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 2: 2023-04-24 12:07:46.635596: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 2: 2023-04-24 12:07:46.635596: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 2: 2023-04-24 12:07:46.635599: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 2: 2023-04-24 12:07:46.635599: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 2: 2023-04-24 12:07:46.635600: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 2: 2023-04-24 12:07:46.635600: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 2: 2023-04-24 12:07:46.635600: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 2: 2023-04-24 12:07:46.635627: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 6: 2023-04-24 12:07:46.636612: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 6: 2023-04-24 12:07:46.636619: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 6: 2023-04-24 12:07:46.636624: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 6: 2023-04-24 12:07:46.636637: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 6: 2023-04-24 12:07:46.636624: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 6: 2023-04-24 12:07:46.636628: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 6: 2023-04-24 12:07:46.636631: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 6: 2023-04-24 12:07:46.636633: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 6: 2023-04-24 12:07:46.636629: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64 6: 2023-04-24 12:07:46.636660: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 6: 2023-04-24 12:07:46.636662: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 6: 2023-04-24 12:07:46.636667: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 6: 2023-04-24 12:07:46.636669: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 6: 2023-04-24 12:07:46.636672: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 6: 2023-04-24 12:07:46.636673: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 6: 2023-04-24 12:07:46.636674: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 6: [--hidden-size HIDDEN_SIZE] 6: [--ffn-hidden-size FFN_HIDDEN_SIZE] 6: [--num-attention-heads NUM_ATTENTION_HEADS] 6: [--kv-channels KV_CHANNELS] 6: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 6: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 6: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 6: [--layernorm-epsilon LAYERNORM_EPSILON] 6: [--sync-tp-duplicated-parameters] 6: [--apply-residual-connection-post-layernorm] 6: [--embed-layernorm] [--openai-gelu] 6: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 6: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 6: [--glu-activation {geglu,liglu,reglu,swiglu}] 6: [--kill-switch-path KILL_SWITCH_PATH] 6: [--log-level {debug,info,warning,error,critical}] 6: [--log-level-replica {debug,info,warning,error,critical}] 6: [--attention-dropout ATTENTION_DROPOUT] 6: [--hidden-dropout HIDDEN_DROPOUT] 6: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 6: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 6: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 6: [--micro-batch-size MICRO_BATCH_SIZE] 6: [--batch-size BATCH_SIZE] 6: [--global-batch-size GLOBAL_BATCH_SIZE] 6: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 6: [--checkpoint-activations] 6: [--distribute-checkpointed-activations] 6: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 6: [--train-iters TRAIN_ITERS] 6: [--train-samples TRAIN_SAMPLES] 6: [--train-tokens TRAIN_TOKENS] 6: [--log-interval LOG_INTERVAL] 6: [--exit-interval EXIT_INTERVAL] 6: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 6: [--tensorboard-dir TENSORBOARD_DIR] 6: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 6: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 6: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 6: [--use-bnb-optimizer] 6: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 6: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 6: [--eval-only EVAL_ONLY] 0: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 0: [--hidden-size HIDDEN_SIZE] 0: [--ffn-hidden-size FFN_HIDDEN_SIZE] 0: [--num-attention-heads NUM_ATTENTION_HEADS] 0: [--kv-channels KV_CHANNELS] 0: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 0: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 0: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 0: [--layernorm-epsilon LAYERNORM_EPSILON] 0: [--sync-tp-duplicated-parameters] 0: [--apply-residual-connection-post-layernorm] 0: [--embed-layernorm] [--openai-gelu] 0: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 0: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 6: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 6: [--inference] 6: [--abort-on-unmet-fused-kernel-constraints] 6: [--pp-partition-method PP_PARTITION_METHOD] 6: [--seed SEED] [--init-method-std INIT_METHOD_STD] 6: [--init-method-xavier-uniform] [--lr LR] 6: [--lr-decay-style {constant,linear,cosine}] 6: [--lr-decay-iters LR_DECAY_ITERS] 6: [--lr-decay-samples LR_DECAY_SAMPLES] 6: [--lr-decay-tokens LR_DECAY_TOKENS] 6: [--lr-warmup-fraction LR_WARMUP_FRACTION] 6: [--lr-warmup-iters LR_WARMUP_ITERS] 6: [--lr-warmup-samples LR_WARMUP_SAMPLES] 6: [--warmup WARMUP] [--min-lr MIN_LR] 6: [--override-lr-scheduler] 6: [--use-checkpoint-lr-scheduler] 0: [--glu-activation {geglu,liglu,reglu,swiglu}] 0: [--kill-switch-path KILL_SWITCH_PATH] 0: [--log-level {debug,info,warning,error,critical}] 0: [--log-level-replica {debug,info,warning,error,critical}] 0: [--attention-dropout ATTENTION_DROPOUT] 0: [--hidden-dropout HIDDEN_DROPOUT] 0: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 0: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 0: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 0: [--micro-batch-size MICRO_BATCH_SIZE] 0: [--batch-size BATCH_SIZE] 0: [--global-batch-size GLOBAL_BATCH_SIZE] 0: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 0: [--checkpoint-activations] 0: [--distribute-checkpointed-activations] 6: [--universal-checkpoint] [--save SAVE] 6: [--save-interval SAVE_INTERVAL] [--no-save-optim] 0: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 0: [--train-iters TRAIN_ITERS] 0: [--train-samples TRAIN_SAMPLES] 0: [--train-tokens TRAIN_TOKENS] 0: [--log-interval LOG_INTERVAL] 0: [--exit-interval EXIT_INTERVAL] 0: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 0: [--tensorboard-dir TENSORBOARD_DIR] 0: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 0: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 0: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 0: [--use-bnb-optimizer] 0: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 0: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 0: [--eval-only EVAL_ONLY] 6: [--no-save-rng] [--load LOAD] [--no-load-optim] 6: [--no-load-rng] [--finetune] [--fp16] [--bf16] 6: [--loss-scale LOSS_SCALE] 6: [--initial-loss-scale INITIAL_LOSS_SCALE] 6: [--min-loss-scale MIN_LOSS_SCALE] 6: [--loss-scale-window LOSS_SCALE_WINDOW] 6: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 6: [--no-query-key-layer-scaling] 6: [--attention-softmax-in-fp32] 6: [--accumulate-allreduce-grads-in-fp32] 6: [--fp16-lm-cross-entropy] 6: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 6: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 6: [--model-parallel-size MODEL_PARALLEL_SIZE] 6: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 0: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 0: [--inference] 0: [--abort-on-unmet-fused-kernel-constraints] 0: [--pp-partition-method PP_PARTITION_METHOD] 0: [--seed SEED] [--init-method-std INIT_METHOD_STD] 0: [--init-method-xavier-uniform] [--lr LR] 0: [--lr-decay-style {constant,linear,cosine}] 0: [--lr-decay-iters LR_DECAY_ITERS] 0: [--lr-decay-samples LR_DECAY_SAMPLES] 0: [--lr-decay-tokens LR_DECAY_TOKENS] 0: [--lr-warmup-fraction LR_WARMUP_FRACTION] 0: [--lr-warmup-iters LR_WARMUP_ITERS] 0: [--lr-warmup-samples LR_WARMUP_SAMPLES] 0: [--warmup WARMUP] [--min-lr MIN_LR] 0: [--override-lr-scheduler] 0: [--use-checkpoint-lr-scheduler] 6: [--distributed-backend {nccl,gloo}] 6: [--DDP-impl {local,torch}] 6: [--use-contiguous-buffers-in-ddp] 6: [--no-scatter-gather-tensors-in-pipeline] 6: [--local_rank LOCAL_RANK] 6: [--lazy-mpu-init LAZY_MPU_INIT] 6: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 6: [--eval-interval EVAL_INTERVAL] 6: [--data-path [DATA_PATH ...]] [--split SPLIT] 6: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 6: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 6: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 6: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 6: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 0: [--universal-checkpoint] [--save SAVE] 0: [--save-interval SAVE_INTERVAL] [--no-save-optim] 6: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 6: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 6: [--merge-file MERGE_FILE] 6: [--vocab-extra-ids VOCAB_EXTRA_IDS] 6: [--seq-length SEQ_LENGTH] 6: [--encoder-seq-length ENCODER_SEQ_LENGTH] 6: [--decoder-seq-length DECODER_SEQ_LENGTH] 6: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 6: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 6: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 6: [--num-workers NUM_WORKERS] 6: [--valid-num-workers VALID_NUM_WORKERS] 6: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 6: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 0: [--no-save-rng] [--load LOAD] [--no-load-optim] 0: [--no-load-rng] [--finetune] [--fp16] [--bf16] 0: [--loss-scale LOSS_SCALE] 0: [--initial-loss-scale INITIAL_LOSS_SCALE] 0: [--min-loss-scale MIN_LOSS_SCALE] 0: [--loss-scale-window LOSS_SCALE_WINDOW] 0: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 0: [--no-query-key-layer-scaling] 0: [--attention-softmax-in-fp32] 0: [--accumulate-allreduce-grads-in-fp32] 0: [--fp16-lm-cross-entropy] 0: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 0: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 0: [--model-parallel-size MODEL_PARALLEL_SIZE] 0: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 6: [--data-impl {lazy,cached,mmap,infer}] 6: [--reset-position-ids] [--reset-attention-mask] 6: [--eod-mask-loss] [--loss-on-targets-only] 6: [--reweight-loss-based-on-position-frequency] 6: [--noise-density NOISE_DENSITY] 6: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 6: [--adlr-autoresume] 6: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 6: [--ict-head-size ICT_HEAD_SIZE] 6: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 6: [--biencoder-shared-query-context-model] 6: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 6: [--titles-data-path TITLES_DATA_PATH] 6: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 6: [--use-one-sent-docs] 6: [--evidence-data-path EVIDENCE_DATA_PATH] 0: [--distributed-backend {nccl,gloo}] 0: [--DDP-impl {local,torch}] 0: [--use-contiguous-buffers-in-ddp] 0: [--no-scatter-gather-tensors-in-pipeline] 0: [--local_rank LOCAL_RANK] 0: [--lazy-mpu-init LAZY_MPU_INIT] 0: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 0: [--eval-interval EVAL_INTERVAL] 0: [--data-path [DATA_PATH ...]] [--split SPLIT] 0: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 0: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 0: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 0: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 0: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 6: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 6: [--retriever-score-scaling] 6: [--block-data-path BLOCK_DATA_PATH] 6: [--embedding-path EMBEDDING_PATH] 6: [--indexer-batch-size INDEXER_BATCH_SIZE] 6: [--indexer-log-interval INDEXER_LOG_INTERVAL] 6: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 6: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 6: [--log-params-norm] [--log-num-zeros-in-grad] 6: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 6: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 6: [--log-timers-to-tensorboard] 6: [--log-batch-size-to-tensorboard] 6: [--no-log-learnig-rate-to-tensorboard] 0: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 0: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 0: [--merge-file MERGE_FILE] 0: [--vocab-extra-ids VOCAB_EXTRA_IDS] 0: [--seq-length SEQ_LENGTH] 0: [--encoder-seq-length ENCODER_SEQ_LENGTH] 0: [--decoder-seq-length DECODER_SEQ_LENGTH] 0: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 0: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 0: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 0: [--num-workers NUM_WORKERS] 0: [--valid-num-workers VALID_NUM_WORKERS] 0: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 0: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 6: [--no-log-loss-scale-to-tensorboard] 6: [--log-validation-ppl-to-tensorboard] 6: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 6: [--zero-contigious-gradients] 6: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 6: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 6: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 6: [--scattered-embeddings] [--split-transformers] 6: [--memory-centric-tiled-linear] 6: [--tile-factor TILE_FACTOR] 6: [--deepspeed-activation-checkpointing] 6: [--partition-activations] [--contigious-checkpointing] 6: [--checkpoint-in-cpu] [--synchronize-each-layer] 6: [--profile-backward] [--deepspeed] 6: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 0: [--data-impl {lazy,cached,mmap,infer}] 0: [--reset-position-ids] [--reset-attention-mask] 0: [--eod-mask-loss] [--loss-on-targets-only] 0: [--reweight-loss-based-on-position-frequency] 0: [--noise-density NOISE_DENSITY] 0: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 0: [--adlr-autoresume] 0: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 0: [--ict-head-size ICT_HEAD_SIZE] 0: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 0: [--biencoder-shared-query-context-model] 0: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 0: [--titles-data-path TITLES_DATA_PATH] 0: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 0: [--use-one-sent-docs] 0: [--evidence-data-path EVIDENCE_DATA_PATH] 6: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 6: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 6: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 6: [--hidden-size HIDDEN_SIZE] 6: [--ffn-hidden-size FFN_HIDDEN_SIZE] 6: [--num-attention-heads NUM_ATTENTION_HEADS] 6: [--kv-channels KV_CHANNELS] 6: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 6: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 6: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 6: [--layernorm-epsilon LAYERNORM_EPSILON] 6: [--sync-tp-duplicated-parameters] 6: [--apply-residual-connection-post-layernorm] 6: [--embed-layernorm] [--openai-gelu] 6: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 0: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 0: [--retriever-score-scaling] 0: [--block-data-path BLOCK_DATA_PATH] 0: [--embedding-path EMBEDDING_PATH] 0: [--indexer-batch-size INDEXER_BATCH_SIZE] 0: [--indexer-log-interval INDEXER_LOG_INTERVAL] 0: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 0: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 0: [--log-params-norm] [--log-num-zeros-in-grad] 0: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 0: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 0: [--log-timers-to-tensorboard] 0: [--log-batch-size-to-tensorboard] 0: [--no-log-learnig-rate-to-tensorboard] 6: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 6: [--glu-activation {geglu,liglu,reglu,swiglu}] 6: [--kill-switch-path KILL_SWITCH_PATH] 6: [--log-level {debug,info,warning,error,critical}] 6: [--log-level-replica {debug,info,warning,error,critical}] 6: [--attention-dropout ATTENTION_DROPOUT] 6: [--hidden-dropout HIDDEN_DROPOUT] 6: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 6: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 6: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 6: [--micro-batch-size MICRO_BATCH_SIZE] 6: [--batch-size BATCH_SIZE] 6: [--global-batch-size GLOBAL_BATCH_SIZE] 6: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 0: [--no-log-loss-scale-to-tensorboard] 0: [--log-validation-ppl-to-tensorboard] 0: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 0: [--zero-contigious-gradients] 0: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 0: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 0: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 0: [--scattered-embeddings] [--split-transformers] 0: [--memory-centric-tiled-linear] 0: [--tile-factor TILE_FACTOR] 0: [--deepspeed-activation-checkpointing] 0: [--partition-activations] [--contigious-checkpointing] 0: [--checkpoint-in-cpu] [--synchronize-each-layer] 0: [--profile-backward] [--deepspeed] 0: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 6: [--checkpoint-activations] 6: [--distribute-checkpointed-activations] 0: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 0: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 6: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 6: [--train-iters TRAIN_ITERS] 6: [--train-samples TRAIN_SAMPLES] 6: [--train-tokens TRAIN_TOKENS] 6: [--log-interval LOG_INTERVAL] 6: [--exit-interval EXIT_INTERVAL] 6: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 6: [--tensorboard-dir TENSORBOARD_DIR] 6: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 6: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 6: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 6: [--use-bnb-optimizer] 6: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 6: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 6: [--eval-only EVAL_ONLY] 6: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 6: [--inference] 6: [--abort-on-unmet-fused-kernel-constraints] 6: [--pp-partition-method PP_PARTITION_METHOD] 6: [--seed SEED] [--init-method-std INIT_METHOD_STD] 6: [--init-method-xavier-uniform] [--lr LR] 6: [--lr-decay-style {constant,linear,cosine}] 6: [--lr-decay-iters LR_DECAY_ITERS] 6: [--lr-decay-samples LR_DECAY_SAMPLES] 6: [--lr-decay-tokens LR_DECAY_TOKENS] 6: [--lr-warmup-fraction LR_WARMUP_FRACTION] 6: [--lr-warmup-iters LR_WARMUP_ITERS] 6: [--lr-warmup-samples LR_WARMUP_SAMPLES] 6: [--warmup WARMUP] [--min-lr MIN_LR] 6: [--override-lr-scheduler] 6: [--use-checkpoint-lr-scheduler] 6: [--universal-checkpoint] [--save SAVE] 6: [--save-interval SAVE_INTERVAL] [--no-save-optim] 6: [--no-save-rng] [--load LOAD] [--no-load-optim] 6: [--no-load-rng] [--finetune] [--fp16] [--bf16] 6: [--loss-scale LOSS_SCALE] 6: [--initial-loss-scale INITIAL_LOSS_SCALE] 6: [--min-loss-scale MIN_LOSS_SCALE] 6: [--loss-scale-window LOSS_SCALE_WINDOW] 6: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 6: [--no-query-key-layer-scaling] 6: [--attention-softmax-in-fp32] 6: [--accumulate-allreduce-grads-in-fp32] 6: [--fp16-lm-cross-entropy] 6: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 6: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 6: [--model-parallel-size MODEL_PARALLEL_SIZE] 6: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 6: [--distributed-backend {nccl,gloo}] 6: [--DDP-impl {local,torch}] 6: [--use-contiguous-buffers-in-ddp] 6: [--no-scatter-gather-tensors-in-pipeline] 6: [--local_rank LOCAL_RANK] 6: [--lazy-mpu-init LAZY_MPU_INIT] 6: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 6: [--eval-interval EVAL_INTERVAL] 6: [--data-path [DATA_PATH ...]] [--split SPLIT] 6: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 6: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 6: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 6: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 6: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 6: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 6: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 6: [--merge-file MERGE_FILE] 6: [--vocab-extra-ids VOCAB_EXTRA_IDS] 6: [--seq-length SEQ_LENGTH] 6: [--encoder-seq-length ENCODER_SEQ_LENGTH] 6: [--decoder-seq-length DECODER_SEQ_LENGTH] 6: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 6: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 6: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 6: [--num-workers NUM_WORKERS] 6: [--valid-num-workers VALID_NUM_WORKERS] 6: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 6: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 6: [--data-impl {lazy,cached,mmap,infer}] 6: [--reset-position-ids] [--reset-attention-mask] 6: [--eod-mask-loss] [--loss-on-targets-only] 6: [--reweight-loss-based-on-position-frequency] 6: [--noise-density NOISE_DENSITY] 6: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 6: [--adlr-autoresume] 6: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 6: [--ict-head-size ICT_HEAD_SIZE] 6: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 6: [--biencoder-shared-query-context-model] 6: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 6: [--titles-data-path TITLES_DATA_PATH] 6: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 6: [--use-one-sent-docs] 6: [--evidence-data-path EVIDENCE_DATA_PATH] 6: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 6: [--retriever-score-scaling] 6: [--block-data-path BLOCK_DATA_PATH] 6: [--embedding-path EMBEDDING_PATH] 6: [--indexer-batch-size INDEXER_BATCH_SIZE] 6: [--indexer-log-interval INDEXER_LOG_INTERVAL] 6: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 6: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 6: [--log-params-norm] [--log-num-zeros-in-grad] 6: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 6: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 6: [--log-timers-to-tensorboard] 6: [--log-batch-size-to-tensorboard] 6: [--no-log-learnig-rate-to-tensorboard] 6: [--no-log-loss-scale-to-tensorboard] 6: [--log-validation-ppl-to-tensorboard] 6: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 6: [--zero-contigious-gradients] 6: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 6: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 6: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 6: [--scattered-embeddings] [--split-transformers] 6: [--memory-centric-tiled-linear] 6: [--tile-factor TILE_FACTOR] 6: [--deepspeed-activation-checkpointing] 6: [--partition-activations] [--contigious-checkpointing] 6: [--checkpoint-in-cpu] [--synchronize-each-layer] 6: [--profile-backward] [--deepspeed] 6: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 6: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 6: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 7: [--hidden-size HIDDEN_SIZE] 7: [--ffn-hidden-size FFN_HIDDEN_SIZE] 7: [--num-attention-heads NUM_ATTENTION_HEADS] 7: [--kv-channels KV_CHANNELS] 7: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 7: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 7: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 7: [--layernorm-epsilon LAYERNORM_EPSILON] 7: [--sync-tp-duplicated-parameters] 7: [--apply-residual-connection-post-layernorm] 7: [--embed-layernorm] [--openai-gelu] 7: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 7: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 7: [--glu-activation {geglu,liglu,reglu,swiglu}] 7: [--kill-switch-path KILL_SWITCH_PATH] 7: [--log-level {debug,info,warning,error,critical}] 7: [--log-level-replica {debug,info,warning,error,critical}] 7: [--attention-dropout ATTENTION_DROPOUT] 7: [--hidden-dropout HIDDEN_DROPOUT] 7: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 7: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 7: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 7: [--micro-batch-size MICRO_BATCH_SIZE] 7: [--batch-size BATCH_SIZE] 7: [--global-batch-size GLOBAL_BATCH_SIZE] 7: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 7: [--checkpoint-activations] 7: [--distribute-checkpointed-activations] 7: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 7: [--train-iters TRAIN_ITERS] 7: [--train-samples TRAIN_SAMPLES] 7: [--train-tokens TRAIN_TOKENS] 7: [--log-interval LOG_INTERVAL] 7: [--exit-interval EXIT_INTERVAL] 7: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 7: [--tensorboard-dir TENSORBOARD_DIR] 7: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 7: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 7: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 7: [--use-bnb-optimizer] 7: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 7: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 7: [--eval-only EVAL_ONLY] 7: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 7: [--inference] 7: [--abort-on-unmet-fused-kernel-constraints] 7: [--pp-partition-method PP_PARTITION_METHOD] 7: [--seed SEED] [--init-method-std INIT_METHOD_STD] 7: [--init-method-xavier-uniform] [--lr LR] 7: [--lr-decay-style {constant,linear,cosine}] 7: [--lr-decay-iters LR_DECAY_ITERS] 7: [--lr-decay-samples LR_DECAY_SAMPLES] 7: [--lr-decay-tokens LR_DECAY_TOKENS] 7: [--lr-warmup-fraction LR_WARMUP_FRACTION] 7: [--lr-warmup-iters LR_WARMUP_ITERS] 7: [--lr-warmup-samples LR_WARMUP_SAMPLES] 7: [--warmup WARMUP] [--min-lr MIN_LR] 7: [--override-lr-scheduler] 7: [--use-checkpoint-lr-scheduler] 7: [--universal-checkpoint] [--save SAVE] 7: [--save-interval SAVE_INTERVAL] [--no-save-optim] 7: [--no-save-rng] [--load LOAD] [--no-load-optim] 7: [--no-load-rng] [--finetune] [--fp16] [--bf16] 7: [--loss-scale LOSS_SCALE] 7: [--initial-loss-scale INITIAL_LOSS_SCALE] 7: [--min-loss-scale MIN_LOSS_SCALE] 7: [--loss-scale-window LOSS_SCALE_WINDOW] 7: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 7: [--no-query-key-layer-scaling] 7: [--attention-softmax-in-fp32] 7: [--accumulate-allreduce-grads-in-fp32] 7: [--fp16-lm-cross-entropy] 7: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 7: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 7: [--model-parallel-size MODEL_PARALLEL_SIZE] 7: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 7: [--distributed-backend {nccl,gloo}] 7: [--DDP-impl {local,torch}] 7: [--use-contiguous-buffers-in-ddp] 7: [--no-scatter-gather-tensors-in-pipeline] 7: [--local_rank LOCAL_RANK] 7: [--lazy-mpu-init LAZY_MPU_INIT] 7: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 7: [--eval-interval EVAL_INTERVAL] 7: [--data-path [DATA_PATH ...]] [--split SPLIT] 7: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 7: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 7: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 7: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 7: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 7: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 7: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 7: [--merge-file MERGE_FILE] 7: [--vocab-extra-ids VOCAB_EXTRA_IDS] 7: [--seq-length SEQ_LENGTH] 7: [--encoder-seq-length ENCODER_SEQ_LENGTH] 7: [--decoder-seq-length DECODER_SEQ_LENGTH] 7: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 7: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 7: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 7: [--num-workers NUM_WORKERS] 7: [--valid-num-workers VALID_NUM_WORKERS] 7: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 7: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 7: [--data-impl {lazy,cached,mmap,infer}] 7: [--reset-position-ids] [--reset-attention-mask] 7: [--eod-mask-loss] [--loss-on-targets-only] 7: [--reweight-loss-based-on-position-frequency] 7: [--noise-density NOISE_DENSITY] 7: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 7: [--adlr-autoresume] 7: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 7: [--ict-head-size ICT_HEAD_SIZE] 7: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 7: [--biencoder-shared-query-context-model] 7: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 7: [--titles-data-path TITLES_DATA_PATH] 7: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 7: [--use-one-sent-docs] 7: [--evidence-data-path EVIDENCE_DATA_PATH] 7: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 7: [--retriever-score-scaling] 7: [--block-data-path BLOCK_DATA_PATH] 7: [--embedding-path EMBEDDING_PATH] 7: [--indexer-batch-size INDEXER_BATCH_SIZE] 7: [--indexer-log-interval INDEXER_LOG_INTERVAL] 7: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 7: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 7: [--log-params-norm] [--log-num-zeros-in-grad] 7: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 7: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 7: [--log-timers-to-tensorboard] 7: [--log-batch-size-to-tensorboard] 7: [--no-log-learnig-rate-to-tensorboard] 7: [--no-log-loss-scale-to-tensorboard] 7: [--log-validation-ppl-to-tensorboard] 7: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 7: [--zero-contigious-gradients] 7: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 7: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 7: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 7: [--scattered-embeddings] [--split-transformers] 7: [--memory-centric-tiled-linear] 7: [--tile-factor TILE_FACTOR] 7: [--deepspeed-activation-checkpointing] 7: [--partition-activations] [--contigious-checkpointing] 7: [--checkpoint-in-cpu] [--synchronize-each-layer] 7: [--profile-backward] [--deepspeed] 7: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 7: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 7: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 6: [--hidden-size HIDDEN_SIZE] 6: [--ffn-hidden-size FFN_HIDDEN_SIZE] 6: [--num-attention-heads NUM_ATTENTION_HEADS] 6: [--kv-channels KV_CHANNELS] 6: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 6: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 6: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 6: [--layernorm-epsilon LAYERNORM_EPSILON] 6: [--sync-tp-duplicated-parameters] 6: [--apply-residual-connection-post-layernorm] 6: [--embed-layernorm] [--openai-gelu] 6: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 6: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 6: [--glu-activation {geglu,liglu,reglu,swiglu}] 6: [--kill-switch-path KILL_SWITCH_PATH] 6: [--log-level {debug,info,warning,error,critical}] 6: [--log-level-replica {debug,info,warning,error,critical}] 6: [--attention-dropout ATTENTION_DROPOUT] 6: [--hidden-dropout HIDDEN_DROPOUT] 6: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 6: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 6: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 6: [--micro-batch-size MICRO_BATCH_SIZE] 6: [--batch-size BATCH_SIZE] 6: [--global-batch-size GLOBAL_BATCH_SIZE] 6: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 6: [--checkpoint-activations] 6: [--distribute-checkpointed-activations] 6: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 6: [--train-iters TRAIN_ITERS] 6: [--train-samples TRAIN_SAMPLES] 6: [--train-tokens TRAIN_TOKENS] 6: [--log-interval LOG_INTERVAL] 6: [--exit-interval EXIT_INTERVAL] 6: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 6: [--tensorboard-dir TENSORBOARD_DIR] 6: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 6: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 6: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 6: [--use-bnb-optimizer] 6: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 6: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 6: [--eval-only EVAL_ONLY] 6: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 6: [--inference] 6: [--abort-on-unmet-fused-kernel-constraints] 6: [--pp-partition-method PP_PARTITION_METHOD] 6: [--seed SEED] [--init-method-std INIT_METHOD_STD] 6: [--init-method-xavier-uniform] [--lr LR] 6: [--lr-decay-style {constant,linear,cosine}] 6: [--lr-decay-iters LR_DECAY_ITERS] 6: [--lr-decay-samples LR_DECAY_SAMPLES] 6: [--lr-decay-tokens LR_DECAY_TOKENS] 6: [--lr-warmup-fraction LR_WARMUP_FRACTION] 6: [--lr-warmup-iters LR_WARMUP_ITERS] 6: [--lr-warmup-samples LR_WARMUP_SAMPLES] 6: [--warmup WARMUP] [--min-lr MIN_LR] 6: [--override-lr-scheduler] 6: [--use-checkpoint-lr-scheduler] 6: [--universal-checkpoint] [--save SAVE] 6: [--save-interval SAVE_INTERVAL] [--no-save-optim] 6: [--no-save-rng] [--load LOAD] [--no-load-optim] 6: [--no-load-rng] [--finetune] [--fp16] [--bf16] 6: [--loss-scale LOSS_SCALE] 6: [--initial-loss-scale INITIAL_LOSS_SCALE] 6: [--min-loss-scale MIN_LOSS_SCALE] 6: [--loss-scale-window LOSS_SCALE_WINDOW] 6: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 6: [--no-query-key-layer-scaling] 6: [--attention-softmax-in-fp32] 6: [--accumulate-allreduce-grads-in-fp32] 6: [--fp16-lm-cross-entropy] 6: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 6: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 6: [--model-parallel-size MODEL_PARALLEL_SIZE] 6: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 6: [--distributed-backend {nccl,gloo}] 6: [--DDP-impl {local,torch}] 6: [--use-contiguous-buffers-in-ddp] 6: [--no-scatter-gather-tensors-in-pipeline] 6: [--local_rank LOCAL_RANK] 6: [--lazy-mpu-init LAZY_MPU_INIT] 6: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 6: [--eval-interval EVAL_INTERVAL] 6: [--data-path [DATA_PATH ...]] [--split SPLIT] 6: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 6: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 6: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 6: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 6: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 6: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 6: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 6: [--merge-file MERGE_FILE] 6: [--vocab-extra-ids VOCAB_EXTRA_IDS] 6: [--seq-length SEQ_LENGTH] 6: [--encoder-seq-length ENCODER_SEQ_LENGTH] 6: [--decoder-seq-length DECODER_SEQ_LENGTH] 6: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 6: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 6: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 6: [--num-workers NUM_WORKERS] 6: [--valid-num-workers VALID_NUM_WORKERS] 6: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 6: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 6: [--data-impl {lazy,cached,mmap,infer}] 6: [--reset-position-ids] [--reset-attention-mask] 6: [--eod-mask-loss] [--loss-on-targets-only] 6: [--reweight-loss-based-on-position-frequency] 6: [--noise-density NOISE_DENSITY] 6: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 6: [--adlr-autoresume] 6: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 6: [--ict-head-size ICT_HEAD_SIZE] 6: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 6: [--biencoder-shared-query-context-model] 6: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 6: [--titles-data-path TITLES_DATA_PATH] 6: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 6: [--use-one-sent-docs] 6: [--evidence-data-path EVIDENCE_DATA_PATH] 6: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 6: [--retriever-score-scaling] 6: [--block-data-path BLOCK_DATA_PATH] 6: [--embedding-path EMBEDDING_PATH] 6: [--indexer-batch-size INDEXER_BATCH_SIZE] 6: [--indexer-log-interval INDEXER_LOG_INTERVAL] 6: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 6: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 6: [--log-params-norm] [--log-num-zeros-in-grad] 6: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 6: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 6: [--log-timers-to-tensorboard] 6: [--log-batch-size-to-tensorboard] 6: [--no-log-learnig-rate-to-tensorboard] 6: [--no-log-loss-scale-to-tensorboard] 6: [--log-validation-ppl-to-tensorboard] 6: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 6: [--zero-contigious-gradients] 6: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 6: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 6: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 6: [--scattered-embeddings] [--split-transformers] 6: [--memory-centric-tiled-linear] 6: [--tile-factor TILE_FACTOR] 6: [--deepspeed-activation-checkpointing] 6: [--partition-activations] [--contigious-checkpointing] 6: [--checkpoint-in-cpu] [--synchronize-each-layer] 6: [--profile-backward] [--deepspeed] 6: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 6: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 6: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 6: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 6: [--hidden-size HIDDEN_SIZE] 6: [--ffn-hidden-size FFN_HIDDEN_SIZE] 6: [--num-attention-heads NUM_ATTENTION_HEADS] 6: [--kv-channels KV_CHANNELS] 6: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 6: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 6: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 6: [--layernorm-epsilon LAYERNORM_EPSILON] 6: [--sync-tp-duplicated-parameters] 6: [--apply-residual-connection-post-layernorm] 6: [--embed-layernorm] [--openai-gelu] 6: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 6: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 6: [--glu-activation {geglu,liglu,reglu,swiglu}] 6: [--kill-switch-path KILL_SWITCH_PATH] 6: [--log-level {debug,info,warning,error,critical}] 6: [--log-level-replica {debug,info,warning,error,critical}] 6: [--attention-dropout ATTENTION_DROPOUT] 6: [--hidden-dropout HIDDEN_DROPOUT] 6: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 6: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 6: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 6: [--micro-batch-size MICRO_BATCH_SIZE] 6: [--batch-size BATCH_SIZE] 6: [--global-batch-size GLOBAL_BATCH_SIZE] 6: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: [--checkpoint-activations] 6: [--distribute-checkpointed-activations] 6: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 6: [--train-iters TRAIN_ITERS] 6: [--train-samples TRAIN_SAMPLES] 6: [--train-tokens TRAIN_TOKENS] 6: [--log-interval LOG_INTERVAL] 6: [--exit-interval EXIT_INTERVAL] 6: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 6: [--tensorboard-dir TENSORBOARD_DIR] 6: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 6: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 6: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 6: [--use-bnb-optimizer] 6: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 6: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 6: [--eval-only EVAL_ONLY] 6: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 6: [--inference] 6: [--abort-on-unmet-fused-kernel-constraints] 6: [--pp-partition-method PP_PARTITION_METHOD] 6: [--seed SEED] [--init-method-std INIT_METHOD_STD] 6: [--init-method-xavier-uniform] [--lr LR] 6: [--lr-decay-style {constant,linear,cosine}] 6: [--lr-decay-iters LR_DECAY_ITERS] 6: [--lr-decay-samples LR_DECAY_SAMPLES] 6: [--lr-decay-tokens LR_DECAY_TOKENS] 6: [--lr-warmup-fraction LR_WARMUP_FRACTION] 6: [--lr-warmup-iters LR_WARMUP_ITERS] 6: [--lr-warmup-samples LR_WARMUP_SAMPLES] 6: [--warmup WARMUP] [--min-lr MIN_LR] 6: [--override-lr-scheduler] 6: [--use-checkpoint-lr-scheduler] 6: [--universal-checkpoint] [--save SAVE] 6: [--save-interval SAVE_INTERVAL] [--no-save-optim] 6: [--no-save-rng] [--load LOAD] [--no-load-optim] 6: [--no-load-rng] [--finetune] [--fp16] [--bf16] 6: [--loss-scale LOSS_SCALE] 6: [--initial-loss-scale INITIAL_LOSS_SCALE] 6: [--min-loss-scale MIN_LOSS_SCALE] 6: [--loss-scale-window LOSS_SCALE_WINDOW] 6: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 6: [--no-query-key-layer-scaling] 6: [--attention-softmax-in-fp32] 6: [--accumulate-allreduce-grads-in-fp32] 6: [--fp16-lm-cross-entropy] 6: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 6: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 6: [--model-parallel-size MODEL_PARALLEL_SIZE] 6: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 6: [--distributed-backend {nccl,gloo}] 6: [--DDP-impl {local,torch}] 6: [--use-contiguous-buffers-in-ddp] 6: [--no-scatter-gather-tensors-in-pipeline] 6: [--local_rank LOCAL_RANK] 6: [--lazy-mpu-init LAZY_MPU_INIT] 6: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 6: [--eval-interval EVAL_INTERVAL] 6: [--data-path [DATA_PATH ...]] [--split SPLIT] 6: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 6: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 6: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 6: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 6: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 6: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 6: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 6: [--merge-file MERGE_FILE] 6: [--vocab-extra-ids VOCAB_EXTRA_IDS] 6: [--seq-length SEQ_LENGTH] 6: [--encoder-seq-length ENCODER_SEQ_LENGTH] 6: [--decoder-seq-length DECODER_SEQ_LENGTH] 6: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 6: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 6: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 6: [--num-workers NUM_WORKERS] 6: [--valid-num-workers VALID_NUM_WORKERS] 6: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 6: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 6: [--data-impl {lazy,cached,mmap,infer}] 6: [--reset-position-ids] [--reset-attention-mask] 6: [--eod-mask-loss] [--loss-on-targets-only] 6: [--reweight-loss-based-on-position-frequency] 6: [--noise-density NOISE_DENSITY] 6: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 6: [--adlr-autoresume] 6: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 6: [--ict-head-size ICT_HEAD_SIZE] 6: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 6: [--biencoder-shared-query-context-model] 6: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 6: [--titles-data-path TITLES_DATA_PATH] 6: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 6: [--use-one-sent-docs] 6: [--evidence-data-path EVIDENCE_DATA_PATH] 6: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 6: [--retriever-score-scaling] 6: [--block-data-path BLOCK_DATA_PATH] 6: [--embedding-path EMBEDDING_PATH] 6: [--indexer-batch-size INDEXER_BATCH_SIZE] 6: [--indexer-log-interval INDEXER_LOG_INTERVAL] 6: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 6: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 6: [--log-params-norm] [--log-num-zeros-in-grad] 6: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 6: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 6: [--log-timers-to-tensorboard] 6: [--log-batch-size-to-tensorboard] 6: [--no-log-learnig-rate-to-tensorboard] 6: [--no-log-loss-scale-to-tensorboard] 6: [--log-validation-ppl-to-tensorboard] 6: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 6: [--zero-contigious-gradients] 6: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 6: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 6: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 6: [--scattered-embeddings] [--split-transformers] 6: [--memory-centric-tiled-linear] 6: [--tile-factor TILE_FACTOR] 6: [--deepspeed-activation-checkpointing] 6: [--partition-activations] [--contigious-checkpointing] 6: [--checkpoint-in-cpu] [--synchronize-each-layer] 6: [--profile-backward] [--deepspeed] 6: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 6: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 6: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 6: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 6: [--hidden-size HIDDEN_SIZE] 6: [--ffn-hidden-size FFN_HIDDEN_SIZE] 6: [--num-attention-heads NUM_ATTENTION_HEADS] 6: [--kv-channels KV_CHANNELS] 6: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 6: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 6: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 6: [--layernorm-epsilon LAYERNORM_EPSILON] 6: [--sync-tp-duplicated-parameters] 6: [--apply-residual-connection-post-layernorm] 6: [--embed-layernorm] [--openai-gelu] 6: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 6: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 6: [--glu-activation {geglu,liglu,reglu,swiglu}] 6: [--kill-switch-path KILL_SWITCH_PATH] 6: [--log-level {debug,info,warning,error,critical}] 6: [--log-level-replica {debug,info,warning,error,critical}] 6: [--attention-dropout ATTENTION_DROPOUT] 6: [--hidden-dropout HIDDEN_DROPOUT] 6: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 6: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 6: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 6: [--micro-batch-size MICRO_BATCH_SIZE] 6: [--batch-size BATCH_SIZE] 6: [--global-batch-size GLOBAL_BATCH_SIZE] 6: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 6: [--checkpoint-activations] 6: [--distribute-checkpointed-activations] 6: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 6: [--train-iters TRAIN_ITERS] 6: [--train-samples TRAIN_SAMPLES] 6: [--train-tokens TRAIN_TOKENS] 6: [--log-interval LOG_INTERVAL] 6: [--exit-interval EXIT_INTERVAL] 6: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 6: [--tensorboard-dir TENSORBOARD_DIR] 6: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 6: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 6: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 6: [--use-bnb-optimizer] 6: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 6: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 6: [--eval-only EVAL_ONLY] 6: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 6: [--inference] 6: [--abort-on-unmet-fused-kernel-constraints] 6: [--pp-partition-method PP_PARTITION_METHOD] 6: [--seed SEED] [--init-method-std INIT_METHOD_STD] 6: [--init-method-xavier-uniform] [--lr LR] 6: [--lr-decay-style {constant,linear,cosine}] 6: [--lr-decay-iters LR_DECAY_ITERS] 6: [--lr-decay-samples LR_DECAY_SAMPLES] 6: [--lr-decay-tokens LR_DECAY_TOKENS] 6: [--lr-warmup-fraction LR_WARMUP_FRACTION] 6: [--lr-warmup-iters LR_WARMUP_ITERS] 6: [--lr-warmup-samples LR_WARMUP_SAMPLES] 6: [--warmup WARMUP] [--min-lr MIN_LR] 6: [--override-lr-scheduler] 6: [--use-checkpoint-lr-scheduler] 6: [--universal-checkpoint] [--save SAVE] 6: [--save-interval SAVE_INTERVAL] [--no-save-optim] 6: [--no-save-rng] [--load LOAD] [--no-load-optim] 6: [--no-load-rng] [--finetune] [--fp16] [--bf16] 6: [--loss-scale LOSS_SCALE] 6: [--initial-loss-scale INITIAL_LOSS_SCALE] 6: [--min-loss-scale MIN_LOSS_SCALE] 6: [--loss-scale-window LOSS_SCALE_WINDOW] 6: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 6: [--no-query-key-layer-scaling] 6: [--attention-softmax-in-fp32] 6: [--accumulate-allreduce-grads-in-fp32] 6: [--fp16-lm-cross-entropy] 6: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 6: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 6: [--model-parallel-size MODEL_PARALLEL_SIZE] 6: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 6: [--distributed-backend {nccl,gloo}] 6: [--DDP-impl {local,torch}] 6: [--use-contiguous-buffers-in-ddp] 6: [--no-scatter-gather-tensors-in-pipeline] 6: [--local_rank LOCAL_RANK] 6: [--lazy-mpu-init LAZY_MPU_INIT] 6: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 6: [--eval-interval EVAL_INTERVAL] 6: [--data-path [DATA_PATH ...]] [--split SPLIT] 6: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 6: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 6: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 6: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 6: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 6: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 6: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 6: [--merge-file MERGE_FILE] 6: [--vocab-extra-ids VOCAB_EXTRA_IDS] 6: [--seq-length SEQ_LENGTH] 6: [--encoder-seq-length ENCODER_SEQ_LENGTH] 6: [--decoder-seq-length DECODER_SEQ_LENGTH] 6: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 6: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 6: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 6: [--num-workers NUM_WORKERS] 6: [--valid-num-workers VALID_NUM_WORKERS] 6: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 6: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 6: [--data-impl {lazy,cached,mmap,infer}] 6: [--reset-position-ids] [--reset-attention-mask] 6: [--eod-mask-loss] [--loss-on-targets-only] 6: [--reweight-loss-based-on-position-frequency] 6: [--noise-density NOISE_DENSITY] 6: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 6: [--adlr-autoresume] 6: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 6: [--ict-head-size ICT_HEAD_SIZE] 6: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 6: [--biencoder-shared-query-context-model] 6: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 6: [--titles-data-path TITLES_DATA_PATH] 6: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 6: [--use-one-sent-docs] 6: [--evidence-data-path EVIDENCE_DATA_PATH] 6: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 6: [--retriever-score-scaling] 6: [--block-data-path BLOCK_DATA_PATH] 6: [--embedding-path EMBEDDING_PATH] 6: [--indexer-batch-size INDEXER_BATCH_SIZE] 6: [--indexer-log-interval INDEXER_LOG_INTERVAL] 6: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 6: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 6: [--log-params-norm] [--log-num-zeros-in-grad] 6: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 6: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 6: [--log-timers-to-tensorboard] 6: [--log-batch-size-to-tensorboard] 6: [--no-log-learnig-rate-to-tensorboard] 6: [--no-log-loss-scale-to-tensorboard] 6: [--log-validation-ppl-to-tensorboard] 6: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 6: [--zero-contigious-gradients] 6: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 6: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 6: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 6: [--scattered-embeddings] [--split-transformers] 6: [--memory-centric-tiled-linear] 6: [--tile-factor TILE_FACTOR] 6: [--deepspeed-activation-checkpointing] 6: [--partition-activations] [--contigious-checkpointing] 6: [--checkpoint-in-cpu] [--synchronize-each-layer] 6: [--profile-backward] [--deepspeed] 6: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 6: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 6: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 6: [--hidden-size HIDDEN_SIZE] 6: [--ffn-hidden-size FFN_HIDDEN_SIZE] 6: [--num-attention-heads NUM_ATTENTION_HEADS] 6: [--kv-channels KV_CHANNELS] 6: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 6: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 6: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 6: [--layernorm-epsilon LAYERNORM_EPSILON] 6: [--sync-tp-duplicated-parameters] 6: [--apply-residual-connection-post-layernorm] 6: [--embed-layernorm] [--openai-gelu] 6: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 6: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 6: [--glu-activation {geglu,liglu,reglu,swiglu}] 6: [--kill-switch-path KILL_SWITCH_PATH] 6: [--log-level {debug,info,warning,error,critical}] 6: [--log-level-replica {debug,info,warning,error,critical}] 6: [--attention-dropout ATTENTION_DROPOUT] 6: [--hidden-dropout HIDDEN_DROPOUT] 6: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 6: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 6: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 6: [--micro-batch-size MICRO_BATCH_SIZE] 6: [--batch-size BATCH_SIZE] 6: [--global-batch-size GLOBAL_BATCH_SIZE] 6: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 6: [--checkpoint-activations] 6: [--distribute-checkpointed-activations] 6: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 6: [--train-iters TRAIN_ITERS] 6: [--train-samples TRAIN_SAMPLES] 6: [--train-tokens TRAIN_TOKENS] 6: [--log-interval LOG_INTERVAL] 6: [--exit-interval EXIT_INTERVAL] 6: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 6: [--tensorboard-dir TENSORBOARD_DIR] 6: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 6: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 6: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 6: [--use-bnb-optimizer] 6: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 6: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 6: [--eval-only EVAL_ONLY] 6: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 6: [--inference] 6: [--abort-on-unmet-fused-kernel-constraints] 6: [--pp-partition-method PP_PARTITION_METHOD] 6: [--seed SEED] [--init-method-std INIT_METHOD_STD] 6: [--init-method-xavier-uniform] [--lr LR] 6: [--lr-decay-style {constant,linear,cosine}] 6: [--lr-decay-iters LR_DECAY_ITERS] 6: [--lr-decay-samples LR_DECAY_SAMPLES] 6: [--lr-decay-tokens LR_DECAY_TOKENS] 6: [--lr-warmup-fraction LR_WARMUP_FRACTION] 6: [--lr-warmup-iters LR_WARMUP_ITERS] 6: [--lr-warmup-samples LR_WARMUP_SAMPLES] 6: [--warmup WARMUP] [--min-lr MIN_LR] 6: [--override-lr-scheduler] 6: [--use-checkpoint-lr-scheduler] 6: [--universal-checkpoint] [--save SAVE] 6: [--save-interval SAVE_INTERVAL] [--no-save-optim] 6: [--no-save-rng] [--load LOAD] [--no-load-optim] 6: [--no-load-rng] [--finetune] [--fp16] [--bf16] 6: [--loss-scale LOSS_SCALE] 6: [--initial-loss-scale INITIAL_LOSS_SCALE] 6: [--min-loss-scale MIN_LOSS_SCALE] 6: [--loss-scale-window LOSS_SCALE_WINDOW] 6: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 6: [--no-query-key-layer-scaling] 6: [--attention-softmax-in-fp32] 6: [--accumulate-allreduce-grads-in-fp32] 6: [--fp16-lm-cross-entropy] 6: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 6: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 6: [--model-parallel-size MODEL_PARALLEL_SIZE] 6: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 6: [--distributed-backend {nccl,gloo}] 6: [--DDP-impl {local,torch}] 6: [--use-contiguous-buffers-in-ddp] 6: [--no-scatter-gather-tensors-in-pipeline] 6: [--local_rank LOCAL_RANK] 6: [--lazy-mpu-init LAZY_MPU_INIT] 6: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 6: [--eval-interval EVAL_INTERVAL] 6: [--data-path [DATA_PATH ...]] [--split SPLIT] 6: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 6: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 6: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 6: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 6: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 6: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 6: [--merge-file MERGE_FILE] 6: [--vocab-extra-ids VOCAB_EXTRA_IDS] 6: [--seq-length SEQ_LENGTH] 6: [--encoder-seq-length ENCODER_SEQ_LENGTH] 6: [--decoder-seq-length DECODER_SEQ_LENGTH] 6: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 6: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 6: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 6: [--num-workers NUM_WORKERS] 6: [--valid-num-workers VALID_NUM_WORKERS] 6: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 6: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 6: [--data-impl {lazy,cached,mmap,infer}] 6: [--reset-position-ids] [--reset-attention-mask] 6: [--eod-mask-loss] [--loss-on-targets-only] 6: [--reweight-loss-based-on-position-frequency] 6: [--noise-density NOISE_DENSITY] 6: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 6: [--adlr-autoresume] 6: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 6: [--ict-head-size ICT_HEAD_SIZE] 6: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 6: [--biencoder-shared-query-context-model] 6: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 6: [--titles-data-path TITLES_DATA_PATH] 6: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 6: [--use-one-sent-docs] 6: [--evidence-data-path EVIDENCE_DATA_PATH] 6: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 6: [--retriever-score-scaling] 6: [--block-data-path BLOCK_DATA_PATH] 6: [--embedding-path EMBEDDING_PATH] 6: [--indexer-batch-size INDEXER_BATCH_SIZE] 6: [--indexer-log-interval INDEXER_LOG_INTERVAL] 6: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 6: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 6: [--log-params-norm] [--log-num-zeros-in-grad] 6: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 6: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 6: [--log-timers-to-tensorboard] 6: [--log-batch-size-to-tensorboard] 6: [--no-log-learnig-rate-to-tensorboard] 6: [--no-log-loss-scale-to-tensorboard] 6: [--log-validation-ppl-to-tensorboard] 6: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 6: [--zero-contigious-gradients] 6: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 6: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 6: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 6: [--scattered-embeddings] [--split-transformers] 6: [--memory-centric-tiled-linear] 6: [--tile-factor TILE_FACTOR] 6: [--deepspeed-activation-checkpointing] 6: [--partition-activations] [--contigious-checkpointing] 6: [--checkpoint-in-cpu] [--synchronize-each-layer] 6: [--profile-backward] [--deepspeed] 6: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 6: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 6: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 6: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 6: [--hidden-size HIDDEN_SIZE] 6: [--ffn-hidden-size FFN_HIDDEN_SIZE] 6: [--num-attention-heads NUM_ATTENTION_HEADS] 6: [--kv-channels KV_CHANNELS] 6: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 6: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 6: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 6: [--layernorm-epsilon LAYERNORM_EPSILON] 6: [--sync-tp-duplicated-parameters] 6: [--apply-residual-connection-post-layernorm] 6: [--embed-layernorm] [--openai-gelu] 6: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 6: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 6: [--glu-activation {geglu,liglu,reglu,swiglu}] 6: [--kill-switch-path KILL_SWITCH_PATH] 6: [--log-level {debug,info,warning,error,critical}] 6: [--log-level-replica {debug,info,warning,error,critical}] 6: [--attention-dropout ATTENTION_DROPOUT] 6: [--hidden-dropout HIDDEN_DROPOUT] 6: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 6: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 6: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 6: [--micro-batch-size MICRO_BATCH_SIZE] 6: [--batch-size BATCH_SIZE] 6: [--global-batch-size GLOBAL_BATCH_SIZE] 6: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 6: [--checkpoint-activations] 6: [--distribute-checkpointed-activations] 6: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 6: [--train-iters TRAIN_ITERS] 6: [--train-samples TRAIN_SAMPLES] 6: [--train-tokens TRAIN_TOKENS] 6: [--log-interval LOG_INTERVAL] 6: [--exit-interval EXIT_INTERVAL] 6: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 6: [--tensorboard-dir TENSORBOARD_DIR] 6: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 6: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 6: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 6: [--use-bnb-optimizer] 6: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 6: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 6: [--eval-only EVAL_ONLY] 6: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 6: [--inference] 6: [--abort-on-unmet-fused-kernel-constraints] 6: [--pp-partition-method PP_PARTITION_METHOD] 6: [--seed SEED] [--init-method-std INIT_METHOD_STD] 6: [--init-method-xavier-uniform] [--lr LR] 6: [--lr-decay-style {constant,linear,cosine}] 6: [--lr-decay-iters LR_DECAY_ITERS] 6: [--lr-decay-samples LR_DECAY_SAMPLES] 6: [--lr-decay-tokens LR_DECAY_TOKENS] 6: [--lr-warmup-fraction LR_WARMUP_FRACTION] 6: [--lr-warmup-iters LR_WARMUP_ITERS] 6: [--lr-warmup-samples LR_WARMUP_SAMPLES] 6: [--warmup WARMUP] [--min-lr MIN_LR] 6: [--override-lr-scheduler] 6: [--use-checkpoint-lr-scheduler] 6: [--universal-checkpoint] [--save SAVE] 6: [--save-interval SAVE_INTERVAL] [--no-save-optim] 6: [--no-save-rng] [--load LOAD] [--no-load-optim] 6: [--no-load-rng] [--finetune] [--fp16] [--bf16] 6: [--loss-scale LOSS_SCALE] 6: [--initial-loss-scale INITIAL_LOSS_SCALE] 6: [--min-loss-scale MIN_LOSS_SCALE] 6: [--loss-scale-window LOSS_SCALE_WINDOW] 6: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 6: [--no-query-key-layer-scaling] 6: [--attention-softmax-in-fp32] 6: [--accumulate-allreduce-grads-in-fp32] 6: [--fp16-lm-cross-entropy] 6: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 6: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 6: [--model-parallel-size MODEL_PARALLEL_SIZE] 6: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: [--distributed-backend {nccl,gloo}] 6: [--DDP-impl {local,torch}] 6: [--use-contiguous-buffers-in-ddp] 6: [--no-scatter-gather-tensors-in-pipeline] 6: [--local_rank LOCAL_RANK] 6: [--lazy-mpu-init LAZY_MPU_INIT] 6: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 6: [--eval-interval EVAL_INTERVAL] 6: [--data-path [DATA_PATH ...]] [--split SPLIT] 6: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 6: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 6: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 6: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 6: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 6: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 6: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 6: [--merge-file MERGE_FILE] 6: [--vocab-extra-ids VOCAB_EXTRA_IDS] 6: [--seq-length SEQ_LENGTH] 6: [--encoder-seq-length ENCODER_SEQ_LENGTH] 6: [--decoder-seq-length DECODER_SEQ_LENGTH] 6: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 6: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 6: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 6: [--num-workers NUM_WORKERS] 6: [--valid-num-workers VALID_NUM_WORKERS] 6: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 6: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 6: [--data-impl {lazy,cached,mmap,infer}] 6: [--reset-position-ids] [--reset-attention-mask] 6: [--eod-mask-loss] [--loss-on-targets-only] 6: [--reweight-loss-based-on-position-frequency] 6: [--noise-density NOISE_DENSITY] 6: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 6: [--adlr-autoresume] 6: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 6: [--ict-head-size ICT_HEAD_SIZE] 6: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 6: [--biencoder-shared-query-context-model] 6: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 6: [--titles-data-path TITLES_DATA_PATH] 6: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 6: [--use-one-sent-docs] 6: [--evidence-data-path EVIDENCE_DATA_PATH] 6: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 6: [--retriever-score-scaling] 6: [--block-data-path BLOCK_DATA_PATH] 6: [--embedding-path EMBEDDING_PATH] 6: [--indexer-batch-size INDEXER_BATCH_SIZE] 6: [--indexer-log-interval INDEXER_LOG_INTERVAL] 6: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 6: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 6: [--log-params-norm] [--log-num-zeros-in-grad] 6: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 6: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 6: [--log-timers-to-tensorboard] 6: [--log-batch-size-to-tensorboard] 6: [--no-log-learnig-rate-to-tensorboard] 6: [--no-log-loss-scale-to-tensorboard] 6: [--log-validation-ppl-to-tensorboard] 6: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 6: [--zero-contigious-gradients] 6: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 6: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 6: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 6: [--scattered-embeddings] [--split-transformers] 6: [--memory-centric-tiled-linear] 6: [--tile-factor TILE_FACTOR] 6: [--deepspeed-activation-checkpointing] 6: [--partition-activations] [--contigious-checkpointing] 6: [--checkpoint-in-cpu] [--synchronize-each-layer] 6: [--profile-backward] [--deepspeed] 6: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 6: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 6: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 6: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 6: [--hidden-size HIDDEN_SIZE] 6: [--ffn-hidden-size FFN_HIDDEN_SIZE] 6: [--num-attention-heads NUM_ATTENTION_HEADS] 6: [--kv-channels KV_CHANNELS] 6: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 6: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 6: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 6: [--layernorm-epsilon LAYERNORM_EPSILON] 6: [--sync-tp-duplicated-parameters] 6: [--apply-residual-connection-post-layernorm] 6: [--embed-layernorm] [--openai-gelu] 6: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 6: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 6: [--glu-activation {geglu,liglu,reglu,swiglu}] 6: [--kill-switch-path KILL_SWITCH_PATH] 6: [--log-level {debug,info,warning,error,critical}] 6: [--log-level-replica {debug,info,warning,error,critical}] 6: [--attention-dropout ATTENTION_DROPOUT] 6: [--hidden-dropout HIDDEN_DROPOUT] 6: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 6: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 6: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 6: [--micro-batch-size MICRO_BATCH_SIZE] 6: [--batch-size BATCH_SIZE] 6: [--global-batch-size GLOBAL_BATCH_SIZE] 6: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 6: [--checkpoint-activations] 6: [--distribute-checkpointed-activations] 6: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 6: [--train-iters TRAIN_ITERS] 6: [--train-samples TRAIN_SAMPLES] 6: [--train-tokens TRAIN_TOKENS] 6: [--log-interval LOG_INTERVAL] 6: [--exit-interval EXIT_INTERVAL] 6: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 6: [--tensorboard-dir TENSORBOARD_DIR] 6: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 6: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 6: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 6: [--use-bnb-optimizer] 6: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 6: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 6: [--eval-only EVAL_ONLY] 6: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 6: [--inference] 6: [--abort-on-unmet-fused-kernel-constraints] 6: [--pp-partition-method PP_PARTITION_METHOD] 6: [--seed SEED] [--init-method-std INIT_METHOD_STD] 6: [--init-method-xavier-uniform] [--lr LR] 6: [--lr-decay-style {constant,linear,cosine}] 6: [--lr-decay-iters LR_DECAY_ITERS] 6: [--lr-decay-samples LR_DECAY_SAMPLES] 6: [--lr-decay-tokens LR_DECAY_TOKENS] 6: [--lr-warmup-fraction LR_WARMUP_FRACTION] 6: [--lr-warmup-iters LR_WARMUP_ITERS] 6: [--lr-warmup-samples LR_WARMUP_SAMPLES] 6: [--warmup WARMUP] [--min-lr MIN_LR] 6: [--override-lr-scheduler] 6: [--use-checkpoint-lr-scheduler] 6: [--universal-checkpoint] [--save SAVE] 6: [--save-interval SAVE_INTERVAL] [--no-save-optim] 6: [--no-save-rng] [--load LOAD] [--no-load-optim] 6: [--no-load-rng] [--finetune] [--fp16] [--bf16] 6: [--loss-scale LOSS_SCALE] 6: [--initial-loss-scale INITIAL_LOSS_SCALE] 6: [--min-loss-scale MIN_LOSS_SCALE] 6: [--loss-scale-window LOSS_SCALE_WINDOW] 6: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 6: [--no-query-key-layer-scaling] 6: [--attention-softmax-in-fp32] 6: [--accumulate-allreduce-grads-in-fp32] 6: [--fp16-lm-cross-entropy] 6: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 6: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 6: [--model-parallel-size MODEL_PARALLEL_SIZE] 6: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 6: [--distributed-backend {nccl,gloo}] 6: [--DDP-impl {local,torch}] 6: [--use-contiguous-buffers-in-ddp] 6: [--no-scatter-gather-tensors-in-pipeline] 6: [--local_rank LOCAL_RANK] 6: [--lazy-mpu-init LAZY_MPU_INIT] 6: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 6: [--eval-interval EVAL_INTERVAL] 6: [--data-path [DATA_PATH ...]] [--split SPLIT] 6: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 6: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 6: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 6: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 6: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 6: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 6: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 6: [--merge-file MERGE_FILE] 6: [--vocab-extra-ids VOCAB_EXTRA_IDS] 6: [--seq-length SEQ_LENGTH] 6: [--encoder-seq-length ENCODER_SEQ_LENGTH] 6: [--decoder-seq-length DECODER_SEQ_LENGTH] 6: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 6: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 6: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 6: [--num-workers NUM_WORKERS] 6: [--valid-num-workers VALID_NUM_WORKERS] 6: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 6: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 6: [--data-impl {lazy,cached,mmap,infer}] 6: [--reset-position-ids] [--reset-attention-mask] 6: [--eod-mask-loss] [--loss-on-targets-only] 6: [--reweight-loss-based-on-position-frequency] 6: [--noise-density NOISE_DENSITY] 6: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 6: [--adlr-autoresume] 6: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 6: [--ict-head-size ICT_HEAD_SIZE] 6: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 6: [--biencoder-shared-query-context-model] 6: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 6: [--titles-data-path TITLES_DATA_PATH] 6: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 6: [--use-one-sent-docs] 6: [--evidence-data-path EVIDENCE_DATA_PATH] 6: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 6: [--retriever-score-scaling] 6: [--block-data-path BLOCK_DATA_PATH] 6: [--embedding-path EMBEDDING_PATH] 6: [--indexer-batch-size INDEXER_BATCH_SIZE] 6: [--indexer-log-interval INDEXER_LOG_INTERVAL] 6: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 6: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 6: [--log-params-norm] [--log-num-zeros-in-grad] 6: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 6: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 6: [--log-timers-to-tensorboard] 6: [--log-batch-size-to-tensorboard] 6: [--no-log-learnig-rate-to-tensorboard] 6: [--no-log-loss-scale-to-tensorboard] 6: [--log-validation-ppl-to-tensorboard] 6: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 6: [--zero-contigious-gradients] 6: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 6: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 6: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 6: [--scattered-embeddings] [--split-transformers] 6: [--memory-centric-tiled-linear] 6: [--tile-factor TILE_FACTOR] 6: [--deepspeed-activation-checkpointing] 6: [--partition-activations] [--contigious-checkpointing] 6: [--checkpoint-in-cpu] [--synchronize-each-layer] 6: [--profile-backward] [--deepspeed] 6: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 6: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 6: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 7: [--hidden-size HIDDEN_SIZE] 7: [--ffn-hidden-size FFN_HIDDEN_SIZE] 7: [--num-attention-heads NUM_ATTENTION_HEADS] 7: [--kv-channels KV_CHANNELS] 7: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 7: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 7: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 7: [--layernorm-epsilon LAYERNORM_EPSILON] 7: [--sync-tp-duplicated-parameters] 7: [--apply-residual-connection-post-layernorm] 7: [--embed-layernorm] [--openai-gelu] 7: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 7: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 7: [--glu-activation {geglu,liglu,reglu,swiglu}] 7: [--kill-switch-path KILL_SWITCH_PATH] 7: [--log-level {debug,info,warning,error,critical}] 7: [--log-level-replica {debug,info,warning,error,critical}] 7: [--attention-dropout ATTENTION_DROPOUT] 7: [--hidden-dropout HIDDEN_DROPOUT] 7: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 7: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 7: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 7: [--micro-batch-size MICRO_BATCH_SIZE] 7: [--batch-size BATCH_SIZE] 7: [--global-batch-size GLOBAL_BATCH_SIZE] 7: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 7: [--checkpoint-activations] 7: [--distribute-checkpointed-activations] 7: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 7: [--train-iters TRAIN_ITERS] 7: [--train-samples TRAIN_SAMPLES] 7: [--train-tokens TRAIN_TOKENS] 7: [--log-interval LOG_INTERVAL] 7: [--exit-interval EXIT_INTERVAL] 7: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 7: [--tensorboard-dir TENSORBOARD_DIR] 7: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 7: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 7: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 7: [--use-bnb-optimizer] 7: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 7: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 7: [--eval-only EVAL_ONLY] 7: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 7: [--inference] 7: [--abort-on-unmet-fused-kernel-constraints] 7: [--pp-partition-method PP_PARTITION_METHOD] 7: [--seed SEED] [--init-method-std INIT_METHOD_STD] 7: [--init-method-xavier-uniform] [--lr LR] 7: [--lr-decay-style {constant,linear,cosine}] 7: [--lr-decay-iters LR_DECAY_ITERS] 7: [--lr-decay-samples LR_DECAY_SAMPLES] 7: [--lr-decay-tokens LR_DECAY_TOKENS] 7: [--lr-warmup-fraction LR_WARMUP_FRACTION] 7: [--lr-warmup-iters LR_WARMUP_ITERS] 7: [--lr-warmup-samples LR_WARMUP_SAMPLES] 7: [--warmup WARMUP] [--min-lr MIN_LR] 7: [--override-lr-scheduler] 7: [--use-checkpoint-lr-scheduler] 7: [--universal-checkpoint] [--save SAVE] 7: [--save-interval SAVE_INTERVAL] [--no-save-optim] 7: [--no-save-rng] [--load LOAD] [--no-load-optim] 7: [--no-load-rng] [--finetune] [--fp16] [--bf16] 7: [--loss-scale LOSS_SCALE] 7: [--initial-loss-scale INITIAL_LOSS_SCALE] 7: [--min-loss-scale MIN_LOSS_SCALE] 7: [--loss-scale-window LOSS_SCALE_WINDOW] 7: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 7: [--no-query-key-layer-scaling] 7: [--attention-softmax-in-fp32] 7: [--accumulate-allreduce-grads-in-fp32] 7: [--fp16-lm-cross-entropy] 7: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 7: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 7: [--model-parallel-size MODEL_PARALLEL_SIZE] 7: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 7: [--distributed-backend {nccl,gloo}] 7: [--DDP-impl {local,torch}] 7: [--use-contiguous-buffers-in-ddp] 7: [--no-scatter-gather-tensors-in-pipeline] 7: [--local_rank LOCAL_RANK] 7: [--lazy-mpu-init LAZY_MPU_INIT] 7: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 7: [--eval-interval EVAL_INTERVAL] 7: [--data-path [DATA_PATH ...]] [--split SPLIT] 7: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 7: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 7: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 7: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 7: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 7: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 7: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 7: [--merge-file MERGE_FILE] 7: [--vocab-extra-ids VOCAB_EXTRA_IDS] 7: [--seq-length SEQ_LENGTH] 7: [--encoder-seq-length ENCODER_SEQ_LENGTH] 7: [--decoder-seq-length DECODER_SEQ_LENGTH] 7: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 7: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 7: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 7: [--num-workers NUM_WORKERS] 7: [--valid-num-workers VALID_NUM_WORKERS] 7: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 7: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 7: [--data-impl {lazy,cached,mmap,infer}] 7: [--reset-position-ids] [--reset-attention-mask] 7: [--eod-mask-loss] [--loss-on-targets-only] 7: [--reweight-loss-based-on-position-frequency] 7: [--noise-density NOISE_DENSITY] 7: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 7: [--adlr-autoresume] 7: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 7: [--ict-head-size ICT_HEAD_SIZE] 7: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 7: [--biencoder-shared-query-context-model] 7: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 7: [--titles-data-path TITLES_DATA_PATH] 7: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 7: [--use-one-sent-docs] 7: [--evidence-data-path EVIDENCE_DATA_PATH] 7: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 7: [--retriever-score-scaling] 7: [--block-data-path BLOCK_DATA_PATH] 7: [--embedding-path EMBEDDING_PATH] 7: [--indexer-batch-size INDEXER_BATCH_SIZE] 7: [--indexer-log-interval INDEXER_LOG_INTERVAL] 7: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 7: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 7: [--log-params-norm] [--log-num-zeros-in-grad] 7: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 7: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 7: [--log-timers-to-tensorboard] 7: [--log-batch-size-to-tensorboard] 7: [--no-log-learnig-rate-to-tensorboard] 7: [--no-log-loss-scale-to-tensorboard] 7: [--log-validation-ppl-to-tensorboard] 7: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 7: [--zero-contigious-gradients] 7: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 7: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 7: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 7: [--scattered-embeddings] [--split-transformers] 7: [--memory-centric-tiled-linear] 7: [--tile-factor TILE_FACTOR] 7: [--deepspeed-activation-checkpointing] 7: [--partition-activations] [--contigious-checkpointing] 7: [--checkpoint-in-cpu] [--synchronize-each-layer] 7: [--profile-backward] [--deepspeed] 7: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 7: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 7: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 7: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 7: [--hidden-size HIDDEN_SIZE] 7: [--ffn-hidden-size FFN_HIDDEN_SIZE] 7: [--num-attention-heads NUM_ATTENTION_HEADS] 7: [--kv-channels KV_CHANNELS] 7: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 7: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 7: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 7: [--layernorm-epsilon LAYERNORM_EPSILON] 7: [--sync-tp-duplicated-parameters] 7: [--apply-residual-connection-post-layernorm] 7: [--embed-layernorm] [--openai-gelu] 7: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 7: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 7: [--glu-activation {geglu,liglu,reglu,swiglu}] 7: [--kill-switch-path KILL_SWITCH_PATH] 7: [--log-level {debug,info,warning,error,critical}] 7: [--log-level-replica {debug,info,warning,error,critical}] 7: [--attention-dropout ATTENTION_DROPOUT] 7: [--hidden-dropout HIDDEN_DROPOUT] 7: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 7: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 7: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 7: [--micro-batch-size MICRO_BATCH_SIZE] 7: [--batch-size BATCH_SIZE] 7: [--global-batch-size GLOBAL_BATCH_SIZE] 7: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 7: [--checkpoint-activations] 7: [--distribute-checkpointed-activations] 7: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 7: [--train-iters TRAIN_ITERS] 7: [--train-samples TRAIN_SAMPLES] 7: [--train-tokens TRAIN_TOKENS] 7: [--log-interval LOG_INTERVAL] 7: [--exit-interval EXIT_INTERVAL] 7: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 7: [--tensorboard-dir TENSORBOARD_DIR] 7: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 7: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 7: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 7: [--use-bnb-optimizer] 7: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 7: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 7: [--eval-only EVAL_ONLY] 7: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 7: [--inference] 7: [--abort-on-unmet-fused-kernel-constraints] 7: [--pp-partition-method PP_PARTITION_METHOD] 7: [--seed SEED] [--init-method-std INIT_METHOD_STD] 7: [--init-method-xavier-uniform] [--lr LR] 7: [--lr-decay-style {constant,linear,cosine}] 7: [--lr-decay-iters LR_DECAY_ITERS] 7: [--lr-decay-samples LR_DECAY_SAMPLES] 7: [--lr-decay-tokens LR_DECAY_TOKENS] 7: [--lr-warmup-fraction LR_WARMUP_FRACTION] 7: [--lr-warmup-iters LR_WARMUP_ITERS] 7: [--lr-warmup-samples LR_WARMUP_SAMPLES] 7: [--warmup WARMUP] [--min-lr MIN_LR] 7: [--override-lr-scheduler] 7: [--use-checkpoint-lr-scheduler] 7: [--universal-checkpoint] [--save SAVE] 7: [--save-interval SAVE_INTERVAL] [--no-save-optim] 7: [--no-save-rng] [--load LOAD] [--no-load-optim] 7: [--no-load-rng] [--finetune] [--fp16] [--bf16] 7: [--loss-scale LOSS_SCALE] 7: [--initial-loss-scale INITIAL_LOSS_SCALE] 7: [--min-loss-scale MIN_LOSS_SCALE] 7: [--loss-scale-window LOSS_SCALE_WINDOW] 7: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 7: [--no-query-key-layer-scaling] 7: [--attention-softmax-in-fp32] 7: [--accumulate-allreduce-grads-in-fp32] 7: [--fp16-lm-cross-entropy] 7: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 7: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 7: [--model-parallel-size MODEL_PARALLEL_SIZE] 7: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 7: [--distributed-backend {nccl,gloo}] 7: [--DDP-impl {local,torch}] 7: [--use-contiguous-buffers-in-ddp] 7: [--no-scatter-gather-tensors-in-pipeline] 7: [--local_rank LOCAL_RANK] 7: [--lazy-mpu-init LAZY_MPU_INIT] 7: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 7: [--eval-interval EVAL_INTERVAL] 7: [--data-path [DATA_PATH ...]] [--split SPLIT] 7: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 7: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 7: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 7: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 7: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 7: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 7: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 7: [--merge-file MERGE_FILE] 7: [--vocab-extra-ids VOCAB_EXTRA_IDS] 7: [--seq-length SEQ_LENGTH] 7: [--encoder-seq-length ENCODER_SEQ_LENGTH] 7: [--decoder-seq-length DECODER_SEQ_LENGTH] 7: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 7: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 7: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 7: [--num-workers NUM_WORKERS] 7: [--valid-num-workers VALID_NUM_WORKERS] 7: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 7: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 7: [--data-impl {lazy,cached,mmap,infer}] 7: [--reset-position-ids] [--reset-attention-mask] 7: [--eod-mask-loss] [--loss-on-targets-only] 7: [--reweight-loss-based-on-position-frequency] 7: [--noise-density NOISE_DENSITY] 7: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 7: [--adlr-autoresume] 7: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 7: [--ict-head-size ICT_HEAD_SIZE] 7: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 7: [--biencoder-shared-query-context-model] 7: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 7: [--titles-data-path TITLES_DATA_PATH] 7: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 7: [--use-one-sent-docs] 7: [--evidence-data-path EVIDENCE_DATA_PATH] 7: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 7: [--retriever-score-scaling] 7: [--block-data-path BLOCK_DATA_PATH] 7: [--embedding-path EMBEDDING_PATH] 7: [--indexer-batch-size INDEXER_BATCH_SIZE] 7: [--indexer-log-interval INDEXER_LOG_INTERVAL] 7: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 7: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 7: [--log-params-norm] [--log-num-zeros-in-grad] 7: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 7: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 7: [--log-timers-to-tensorboard] 7: [--log-batch-size-to-tensorboard] 7: [--no-log-learnig-rate-to-tensorboard] 7: [--no-log-loss-scale-to-tensorboard] 7: [--log-validation-ppl-to-tensorboard] 7: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 7: [--zero-contigious-gradients] 7: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 7: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 7: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 7: [--scattered-embeddings] [--split-transformers] 7: [--memory-centric-tiled-linear] 7: [--tile-factor TILE_FACTOR] 7: [--deepspeed-activation-checkpointing] 7: [--partition-activations] [--contigious-checkpointing] 7: [--checkpoint-in-cpu] [--synchronize-each-layer] 7: [--profile-backward] [--deepspeed] 7: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 7: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 7: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 0: [--hidden-size HIDDEN_SIZE] 0: [--ffn-hidden-size FFN_HIDDEN_SIZE] 0: [--num-attention-heads NUM_ATTENTION_HEADS] 0: [--kv-channels KV_CHANNELS] 0: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 0: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 0: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 0: [--layernorm-epsilon LAYERNORM_EPSILON] 0: [--sync-tp-duplicated-parameters] 0: [--apply-residual-connection-post-layernorm] 0: [--embed-layernorm] [--openai-gelu] 0: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 0: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 0: [--glu-activation {geglu,liglu,reglu,swiglu}] 0: [--kill-switch-path KILL_SWITCH_PATH] 0: [--log-level {debug,info,warning,error,critical}] 0: [--log-level-replica {debug,info,warning,error,critical}] 0: [--attention-dropout ATTENTION_DROPOUT] 0: [--hidden-dropout HIDDEN_DROPOUT] 0: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 0: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 0: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 0: [--micro-batch-size MICRO_BATCH_SIZE] 0: [--batch-size BATCH_SIZE] 0: [--global-batch-size GLOBAL_BATCH_SIZE] 0: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 0: [--checkpoint-activations] 0: [--distribute-checkpointed-activations] 0: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 0: [--train-iters TRAIN_ITERS] 0: [--train-samples TRAIN_SAMPLES] 0: [--train-tokens TRAIN_TOKENS] 0: [--log-interval LOG_INTERVAL] 0: [--exit-interval EXIT_INTERVAL] 0: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 0: [--tensorboard-dir TENSORBOARD_DIR] 0: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 0: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 0: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 0: [--use-bnb-optimizer] 0: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 0: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 0: [--eval-only EVAL_ONLY] 0: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 0: [--inference] 0: [--abort-on-unmet-fused-kernel-constraints] 0: [--pp-partition-method PP_PARTITION_METHOD] 0: [--seed SEED] [--init-method-std INIT_METHOD_STD] 0: [--init-method-xavier-uniform] [--lr LR] 0: [--lr-decay-style {constant,linear,cosine}] 0: [--lr-decay-iters LR_DECAY_ITERS] 0: [--lr-decay-samples LR_DECAY_SAMPLES] 0: [--lr-decay-tokens LR_DECAY_TOKENS] 0: [--lr-warmup-fraction LR_WARMUP_FRACTION] 0: [--lr-warmup-iters LR_WARMUP_ITERS] 0: [--lr-warmup-samples LR_WARMUP_SAMPLES] 0: [--warmup WARMUP] [--min-lr MIN_LR] 0: [--override-lr-scheduler] 0: [--use-checkpoint-lr-scheduler] 0: [--universal-checkpoint] [--save SAVE] 0: [--save-interval SAVE_INTERVAL] [--no-save-optim] 0: [--no-save-rng] [--load LOAD] [--no-load-optim] 0: [--no-load-rng] [--finetune] [--fp16] [--bf16] 0: [--loss-scale LOSS_SCALE] 0: [--initial-loss-scale INITIAL_LOSS_SCALE] 0: [--min-loss-scale MIN_LOSS_SCALE] 0: [--loss-scale-window LOSS_SCALE_WINDOW] 0: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 0: [--no-query-key-layer-scaling] 0: [--attention-softmax-in-fp32] 0: [--accumulate-allreduce-grads-in-fp32] 0: [--fp16-lm-cross-entropy] 0: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 0: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 0: [--model-parallel-size MODEL_PARALLEL_SIZE] 0: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 0: [--distributed-backend {nccl,gloo}] 0: [--DDP-impl {local,torch}] 0: [--use-contiguous-buffers-in-ddp] 0: [--no-scatter-gather-tensors-in-pipeline] 0: [--local_rank LOCAL_RANK] 0: [--lazy-mpu-init LAZY_MPU_INIT] 0: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 0: [--eval-interval EVAL_INTERVAL] 0: [--data-path [DATA_PATH ...]] [--split SPLIT] 0: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 0: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 0: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 0: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 0: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 0: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 0: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 0: [--merge-file MERGE_FILE] 0: [--vocab-extra-ids VOCAB_EXTRA_IDS] 0: [--seq-length SEQ_LENGTH] 0: [--encoder-seq-length ENCODER_SEQ_LENGTH] 0: [--decoder-seq-length DECODER_SEQ_LENGTH] 0: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 0: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 0: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 0: [--num-workers NUM_WORKERS] 0: [--valid-num-workers VALID_NUM_WORKERS] 0: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 0: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 0: [--data-impl {lazy,cached,mmap,infer}] 0: [--reset-position-ids] [--reset-attention-mask] 0: [--eod-mask-loss] [--loss-on-targets-only] 0: [--reweight-loss-based-on-position-frequency] 0: [--noise-density NOISE_DENSITY] 0: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 0: [--adlr-autoresume] 0: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 0: [--ict-head-size ICT_HEAD_SIZE] 0: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 0: [--biencoder-shared-query-context-model] 0: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 0: [--titles-data-path TITLES_DATA_PATH] 0: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 0: [--use-one-sent-docs] 0: [--evidence-data-path EVIDENCE_DATA_PATH] 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 0: [--retriever-score-scaling] 0: [--block-data-path BLOCK_DATA_PATH] 0: [--embedding-path EMBEDDING_PATH] 0: [--indexer-batch-size INDEXER_BATCH_SIZE] 0: [--indexer-log-interval INDEXER_LOG_INTERVAL] 0: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 0: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 0: [--log-params-norm] [--log-num-zeros-in-grad] 0: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 0: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 0: [--log-timers-to-tensorboard] 0: [--log-batch-size-to-tensorboard] 0: [--no-log-learnig-rate-to-tensorboard] 0: [--no-log-loss-scale-to-tensorboard] 0: [--log-validation-ppl-to-tensorboard] 0: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 0: [--zero-contigious-gradients] 0: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 0: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 0: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 0: [--scattered-embeddings] [--split-transformers] 0: [--memory-centric-tiled-linear] 0: [--tile-factor TILE_FACTOR] 0: [--deepspeed-activation-checkpointing] 0: [--partition-activations] [--contigious-checkpointing] 0: [--checkpoint-in-cpu] [--synchronize-each-layer] 0: [--profile-backward] [--deepspeed] 0: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 0: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 0: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 0: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 0: [--hidden-size HIDDEN_SIZE] 0: [--ffn-hidden-size FFN_HIDDEN_SIZE] 0: [--num-attention-heads NUM_ATTENTION_HEADS] 0: [--kv-channels KV_CHANNELS] 0: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 0: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 0: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 0: [--layernorm-epsilon LAYERNORM_EPSILON] 0: [--sync-tp-duplicated-parameters] 0: [--apply-residual-connection-post-layernorm] 0: [--embed-layernorm] [--openai-gelu] 0: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 0: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 0: [--glu-activation {geglu,liglu,reglu,swiglu}] 0: [--kill-switch-path KILL_SWITCH_PATH] 0: [--log-level {debug,info,warning,error,critical}] 0: [--log-level-replica {debug,info,warning,error,critical}] 0: [--attention-dropout ATTENTION_DROPOUT] 0: [--hidden-dropout HIDDEN_DROPOUT] 0: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 0: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 0: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 0: [--micro-batch-size MICRO_BATCH_SIZE] 0: [--batch-size BATCH_SIZE] 0: [--global-batch-size GLOBAL_BATCH_SIZE] 0: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 0: [--checkpoint-activations] 0: [--distribute-checkpointed-activations] 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 0: [--train-iters TRAIN_ITERS] 0: [--train-samples TRAIN_SAMPLES] 0: [--train-tokens TRAIN_TOKENS] 0: [--log-interval LOG_INTERVAL] 0: [--exit-interval EXIT_INTERVAL] 0: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 0: [--tensorboard-dir TENSORBOARD_DIR] 0: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 0: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 0: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 0: [--use-bnb-optimizer] 0: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 0: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 0: [--eval-only EVAL_ONLY] 0: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 0: [--inference] 0: [--abort-on-unmet-fused-kernel-constraints] 0: [--pp-partition-method PP_PARTITION_METHOD] 0: [--seed SEED] [--init-method-std INIT_METHOD_STD] 0: [--init-method-xavier-uniform] [--lr LR] 0: [--lr-decay-style {constant,linear,cosine}] 0: [--lr-decay-iters LR_DECAY_ITERS] 0: [--lr-decay-samples LR_DECAY_SAMPLES] 0: [--lr-decay-tokens LR_DECAY_TOKENS] 0: [--lr-warmup-fraction LR_WARMUP_FRACTION] 0: [--lr-warmup-iters LR_WARMUP_ITERS] 0: [--lr-warmup-samples LR_WARMUP_SAMPLES] 0: [--warmup WARMUP] [--min-lr MIN_LR] 0: [--override-lr-scheduler] 0: [--use-checkpoint-lr-scheduler] 0: [--universal-checkpoint] [--save SAVE] 0: [--save-interval SAVE_INTERVAL] [--no-save-optim] 0: [--no-save-rng] [--load LOAD] [--no-load-optim] 0: [--no-load-rng] [--finetune] [--fp16] [--bf16] 0: [--loss-scale LOSS_SCALE] 0: [--initial-loss-scale INITIAL_LOSS_SCALE] 0: [--min-loss-scale MIN_LOSS_SCALE] 0: [--loss-scale-window LOSS_SCALE_WINDOW] 0: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 0: [--no-query-key-layer-scaling] 0: [--attention-softmax-in-fp32] 0: [--accumulate-allreduce-grads-in-fp32] 0: [--fp16-lm-cross-entropy] 0: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 0: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 0: [--model-parallel-size MODEL_PARALLEL_SIZE] 0: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 0: [--distributed-backend {nccl,gloo}] 0: [--DDP-impl {local,torch}] 0: [--use-contiguous-buffers-in-ddp] 0: [--no-scatter-gather-tensors-in-pipeline] 0: [--local_rank LOCAL_RANK] 0: [--lazy-mpu-init LAZY_MPU_INIT] 0: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 0: [--eval-interval EVAL_INTERVAL] 0: [--data-path [DATA_PATH ...]] [--split SPLIT] 0: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 0: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 0: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 0: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 0: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 0: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 0: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 0: [--merge-file MERGE_FILE] 0: [--vocab-extra-ids VOCAB_EXTRA_IDS] 0: [--seq-length SEQ_LENGTH] 0: [--encoder-seq-length ENCODER_SEQ_LENGTH] 0: [--decoder-seq-length DECODER_SEQ_LENGTH] 0: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 0: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 0: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 0: [--num-workers NUM_WORKERS] 0: [--valid-num-workers VALID_NUM_WORKERS] 0: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 0: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 0: [--data-impl {lazy,cached,mmap,infer}] 0: [--reset-position-ids] [--reset-attention-mask] 0: [--eod-mask-loss] [--loss-on-targets-only] 0: [--reweight-loss-based-on-position-frequency] 0: [--noise-density NOISE_DENSITY] 0: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 0: [--adlr-autoresume] 0: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 0: [--ict-head-size ICT_HEAD_SIZE] 0: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 0: [--biencoder-shared-query-context-model] 0: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 0: [--titles-data-path TITLES_DATA_PATH] 0: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 0: [--use-one-sent-docs] 0: [--evidence-data-path EVIDENCE_DATA_PATH] 0: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 0: [--retriever-score-scaling] 0: [--block-data-path BLOCK_DATA_PATH] 0: [--embedding-path EMBEDDING_PATH] 0: [--indexer-batch-size INDEXER_BATCH_SIZE] 0: [--indexer-log-interval INDEXER_LOG_INTERVAL] 0: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 0: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 0: [--log-params-norm] [--log-num-zeros-in-grad] 0: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 0: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 0: [--log-timers-to-tensorboard] 0: [--log-batch-size-to-tensorboard] 0: [--no-log-learnig-rate-to-tensorboard] 0: [--no-log-loss-scale-to-tensorboard] 0: [--log-validation-ppl-to-tensorboard] 0: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 0: [--zero-contigious-gradients] 0: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 0: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 0: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 0: [--scattered-embeddings] [--split-transformers] 0: [--memory-centric-tiled-linear] 0: [--tile-factor TILE_FACTOR] 0: [--deepspeed-activation-checkpointing] 0: [--partition-activations] [--contigious-checkpointing] 0: [--checkpoint-in-cpu] [--synchronize-each-layer] 0: [--profile-backward] [--deepspeed] 0: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 0: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 0: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 0: [--hidden-size HIDDEN_SIZE] 0: [--ffn-hidden-size FFN_HIDDEN_SIZE] 0: [--num-attention-heads NUM_ATTENTION_HEADS] 0: [--kv-channels KV_CHANNELS] 0: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 0: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 0: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 0: [--layernorm-epsilon LAYERNORM_EPSILON] 0: [--sync-tp-duplicated-parameters] 0: [--apply-residual-connection-post-layernorm] 0: [--embed-layernorm] [--openai-gelu] 0: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 0: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 0: [--glu-activation {geglu,liglu,reglu,swiglu}] 0: [--kill-switch-path KILL_SWITCH_PATH] 0: [--log-level {debug,info,warning,error,critical}] 0: [--log-level-replica {debug,info,warning,error,critical}] 0: [--attention-dropout ATTENTION_DROPOUT] 0: [--hidden-dropout HIDDEN_DROPOUT] 0: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 0: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 0: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 0: [--micro-batch-size MICRO_BATCH_SIZE] 0: [--batch-size BATCH_SIZE] 0: [--global-batch-size GLOBAL_BATCH_SIZE] 0: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 0: [--checkpoint-activations] 0: [--distribute-checkpointed-activations] 0: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 0: [--train-iters TRAIN_ITERS] 0: [--train-samples TRAIN_SAMPLES] 0: [--train-tokens TRAIN_TOKENS] 0: [--log-interval LOG_INTERVAL] 0: [--exit-interval EXIT_INTERVAL] 0: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 0: [--tensorboard-dir TENSORBOARD_DIR] 0: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 0: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 0: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 0: [--use-bnb-optimizer] 0: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 0: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 0: [--eval-only EVAL_ONLY] 0: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 0: [--inference] 0: [--abort-on-unmet-fused-kernel-constraints] 0: [--pp-partition-method PP_PARTITION_METHOD] 0: [--seed SEED] [--init-method-std INIT_METHOD_STD] 0: [--init-method-xavier-uniform] [--lr LR] 0: [--lr-decay-style {constant,linear,cosine}] 0: [--lr-decay-iters LR_DECAY_ITERS] 0: [--lr-decay-samples LR_DECAY_SAMPLES] 0: [--lr-decay-tokens LR_DECAY_TOKENS] 0: [--lr-warmup-fraction LR_WARMUP_FRACTION] 0: [--lr-warmup-iters LR_WARMUP_ITERS] 0: [--lr-warmup-samples LR_WARMUP_SAMPLES] 0: [--warmup WARMUP] [--min-lr MIN_LR] 0: [--override-lr-scheduler] 0: [--use-checkpoint-lr-scheduler] 0: [--universal-checkpoint] [--save SAVE] 0: [--save-interval SAVE_INTERVAL] [--no-save-optim] 0: [--no-save-rng] [--load LOAD] [--no-load-optim] 0: [--no-load-rng] [--finetune] [--fp16] [--bf16] 0: [--loss-scale LOSS_SCALE] 0: [--initial-loss-scale INITIAL_LOSS_SCALE] 0: [--min-loss-scale MIN_LOSS_SCALE] 0: [--loss-scale-window LOSS_SCALE_WINDOW] 0: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 0: [--no-query-key-layer-scaling] 0: [--attention-softmax-in-fp32] 0: [--accumulate-allreduce-grads-in-fp32] 0: [--fp16-lm-cross-entropy] 0: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 0: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 0: [--model-parallel-size MODEL_PARALLEL_SIZE] 0: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 0: [--distributed-backend {nccl,gloo}] 0: [--DDP-impl {local,torch}] 0: [--use-contiguous-buffers-in-ddp] 0: [--no-scatter-gather-tensors-in-pipeline] 0: [--local_rank LOCAL_RANK] 0: [--lazy-mpu-init LAZY_MPU_INIT] 0: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 0: [--eval-interval EVAL_INTERVAL] 0: [--data-path [DATA_PATH ...]] [--split SPLIT] 0: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 0: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 0: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 0: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 0: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 0: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 0: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 0: [--merge-file MERGE_FILE] 0: [--vocab-extra-ids VOCAB_EXTRA_IDS] 0: [--seq-length SEQ_LENGTH] 0: [--encoder-seq-length ENCODER_SEQ_LENGTH] 0: [--decoder-seq-length DECODER_SEQ_LENGTH] 0: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 0: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 0: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 0: [--num-workers NUM_WORKERS] 0: [--valid-num-workers VALID_NUM_WORKERS] 0: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 0: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 0: [--data-impl {lazy,cached,mmap,infer}] 0: [--reset-position-ids] [--reset-attention-mask] 0: [--eod-mask-loss] [--loss-on-targets-only] 0: [--reweight-loss-based-on-position-frequency] 0: [--noise-density NOISE_DENSITY] 0: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 0: [--adlr-autoresume] 0: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 0: [--ict-head-size ICT_HEAD_SIZE] 0: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 0: [--biencoder-shared-query-context-model] 0: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 0: [--titles-data-path TITLES_DATA_PATH] 0: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 0: [--use-one-sent-docs] 0: [--evidence-data-path EVIDENCE_DATA_PATH] 0: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 0: [--retriever-score-scaling] 0: [--block-data-path BLOCK_DATA_PATH] 0: [--embedding-path EMBEDDING_PATH] 0: [--indexer-batch-size INDEXER_BATCH_SIZE] 0: [--indexer-log-interval INDEXER_LOG_INTERVAL] 0: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 0: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 0: [--log-params-norm] [--log-num-zeros-in-grad] 0: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 0: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 0: [--log-timers-to-tensorboard] 0: [--log-batch-size-to-tensorboard] 0: [--no-log-learnig-rate-to-tensorboard] 0: [--no-log-loss-scale-to-tensorboard] 0: [--log-validation-ppl-to-tensorboard] 0: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 0: [--zero-contigious-gradients] 0: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 0: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 0: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 0: [--scattered-embeddings] [--split-transformers] 0: [--memory-centric-tiled-linear] 0: [--tile-factor TILE_FACTOR] 0: [--deepspeed-activation-checkpointing] 0: [--partition-activations] [--contigious-checkpointing] 0: [--checkpoint-in-cpu] [--synchronize-each-layer] 0: [--profile-backward] [--deepspeed] 0: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 0: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 0: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 0: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 0: [--hidden-size HIDDEN_SIZE] 0: [--ffn-hidden-size FFN_HIDDEN_SIZE] 0: [--num-attention-heads NUM_ATTENTION_HEADS] 0: [--kv-channels KV_CHANNELS] 0: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 0: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 0: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 0: [--layernorm-epsilon LAYERNORM_EPSILON] 0: [--sync-tp-duplicated-parameters] 0: [--apply-residual-connection-post-layernorm] 0: [--embed-layernorm] [--openai-gelu] 0: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 0: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 0: [--glu-activation {geglu,liglu,reglu,swiglu}] 0: [--kill-switch-path KILL_SWITCH_PATH] 0: [--log-level {debug,info,warning,error,critical}] 0: [--log-level-replica {debug,info,warning,error,critical}] 0: [--attention-dropout ATTENTION_DROPOUT] 0: [--hidden-dropout HIDDEN_DROPOUT] 0: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 0: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 0: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 0: [--micro-batch-size MICRO_BATCH_SIZE] 0: [--batch-size BATCH_SIZE] 0: [--global-batch-size GLOBAL_BATCH_SIZE] 0: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 0: [--checkpoint-activations] 0: [--distribute-checkpointed-activations] 0: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 0: [--train-iters TRAIN_ITERS] 0: [--train-samples TRAIN_SAMPLES] 0: [--train-tokens TRAIN_TOKENS] 0: [--log-interval LOG_INTERVAL] 0: [--exit-interval EXIT_INTERVAL] 0: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 0: [--tensorboard-dir TENSORBOARD_DIR] 0: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 0: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 0: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 0: [--use-bnb-optimizer] 0: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 0: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 0: [--eval-only EVAL_ONLY] 0: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 0: [--inference] 0: [--abort-on-unmet-fused-kernel-constraints] 0: [--pp-partition-method PP_PARTITION_METHOD] 0: [--seed SEED] [--init-method-std INIT_METHOD_STD] 0: [--init-method-xavier-uniform] [--lr LR] 0: [--lr-decay-style {constant,linear,cosine}] 0: [--lr-decay-iters LR_DECAY_ITERS] 0: [--lr-decay-samples LR_DECAY_SAMPLES] 0: [--lr-decay-tokens LR_DECAY_TOKENS] 0: [--lr-warmup-fraction LR_WARMUP_FRACTION] 0: [--lr-warmup-iters LR_WARMUP_ITERS] 0: [--lr-warmup-samples LR_WARMUP_SAMPLES] 0: [--warmup WARMUP] [--min-lr MIN_LR] 0: [--override-lr-scheduler] 0: [--use-checkpoint-lr-scheduler] 0: [--universal-checkpoint] [--save SAVE] 0: [--save-interval SAVE_INTERVAL] [--no-save-optim] 0: [--no-save-rng] [--load LOAD] [--no-load-optim] 0: [--no-load-rng] [--finetune] [--fp16] [--bf16] 0: [--loss-scale LOSS_SCALE] 0: [--initial-loss-scale INITIAL_LOSS_SCALE] 0: [--min-loss-scale MIN_LOSS_SCALE] 0: [--loss-scale-window LOSS_SCALE_WINDOW] 0: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 0: [--no-query-key-layer-scaling] 0: [--attention-softmax-in-fp32] 0: [--accumulate-allreduce-grads-in-fp32] 0: [--fp16-lm-cross-entropy] 0: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 0: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 0: [--model-parallel-size MODEL_PARALLEL_SIZE] 0: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 0: [--distributed-backend {nccl,gloo}] 0: [--DDP-impl {local,torch}] 0: [--use-contiguous-buffers-in-ddp] 0: [--no-scatter-gather-tensors-in-pipeline] 0: [--local_rank LOCAL_RANK] 0: [--lazy-mpu-init LAZY_MPU_INIT] 0: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 0: [--eval-interval EVAL_INTERVAL] 0: [--data-path [DATA_PATH ...]] [--split SPLIT] 0: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 0: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 0: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 0: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 0: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 0: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 0: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 0: [--merge-file MERGE_FILE] 0: [--vocab-extra-ids VOCAB_EXTRA_IDS] 0: [--seq-length SEQ_LENGTH] 0: [--encoder-seq-length ENCODER_SEQ_LENGTH] 0: [--decoder-seq-length DECODER_SEQ_LENGTH] 0: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 0: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 0: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 0: [--num-workers NUM_WORKERS] 0: [--valid-num-workers VALID_NUM_WORKERS] 0: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 0: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 0: [--data-impl {lazy,cached,mmap,infer}] 0: [--reset-position-ids] [--reset-attention-mask] 0: [--eod-mask-loss] [--loss-on-targets-only] 0: [--reweight-loss-based-on-position-frequency] 0: [--noise-density NOISE_DENSITY] 0: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 0: [--adlr-autoresume] 0: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 0: [--ict-head-size ICT_HEAD_SIZE] 0: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 0: [--biencoder-shared-query-context-model] 0: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 0: [--titles-data-path TITLES_DATA_PATH] 0: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 0: [--use-one-sent-docs] 0: [--evidence-data-path EVIDENCE_DATA_PATH] 0: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 0: [--retriever-score-scaling] 0: [--block-data-path BLOCK_DATA_PATH] 0: [--embedding-path EMBEDDING_PATH] 0: [--indexer-batch-size INDEXER_BATCH_SIZE] 0: [--indexer-log-interval INDEXER_LOG_INTERVAL] 0: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 0: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 0: [--log-params-norm] [--log-num-zeros-in-grad] 0: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 0: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 0: [--log-timers-to-tensorboard] 0: [--log-batch-size-to-tensorboard] 0: [--no-log-learnig-rate-to-tensorboard] 0: [--no-log-loss-scale-to-tensorboard] 0: [--log-validation-ppl-to-tensorboard] 0: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 0: [--zero-contigious-gradients] 0: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 0: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 0: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 0: [--scattered-embeddings] [--split-transformers] 0: [--memory-centric-tiled-linear] 0: [--tile-factor TILE_FACTOR] 0: [--deepspeed-activation-checkpointing] 0: [--partition-activations] [--contigious-checkpointing] 0: [--checkpoint-in-cpu] [--synchronize-each-layer] 0: [--profile-backward] [--deepspeed] 0: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 0: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 0: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 0: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 0: [--hidden-size HIDDEN_SIZE] 0: [--ffn-hidden-size FFN_HIDDEN_SIZE] 0: [--num-attention-heads NUM_ATTENTION_HEADS] 0: [--kv-channels KV_CHANNELS] 0: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 0: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 0: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 0: [--layernorm-epsilon LAYERNORM_EPSILON] 0: [--sync-tp-duplicated-parameters] 0: [--apply-residual-connection-post-layernorm] 0: [--embed-layernorm] [--openai-gelu] 0: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 0: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 0: [--glu-activation {geglu,liglu,reglu,swiglu}] 0: [--kill-switch-path KILL_SWITCH_PATH] 0: [--log-level {debug,info,warning,error,critical}] 0: [--log-level-replica {debug,info,warning,error,critical}] 0: [--attention-dropout ATTENTION_DROPOUT] 0: [--hidden-dropout HIDDEN_DROPOUT] 0: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 0: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 0: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 0: [--micro-batch-size MICRO_BATCH_SIZE] 0: [--batch-size BATCH_SIZE] 0: [--global-batch-size GLOBAL_BATCH_SIZE] 0: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 0: [--checkpoint-activations] 0: [--distribute-checkpointed-activations] 0: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 0: [--train-iters TRAIN_ITERS] 0: [--train-samples TRAIN_SAMPLES] 0: [--train-tokens TRAIN_TOKENS] 0: [--log-interval LOG_INTERVAL] 0: [--exit-interval EXIT_INTERVAL] 0: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 0: [--tensorboard-dir TENSORBOARD_DIR] 0: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 0: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 0: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 0: [--use-bnb-optimizer] 0: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 0: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 0: [--eval-only EVAL_ONLY] 0: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 0: [--inference] 0: [--abort-on-unmet-fused-kernel-constraints] 0: [--pp-partition-method PP_PARTITION_METHOD] 0: [--seed SEED] [--init-method-std INIT_METHOD_STD] 0: [--init-method-xavier-uniform] [--lr LR] 0: [--lr-decay-style {constant,linear,cosine}] 0: [--lr-decay-iters LR_DECAY_ITERS] 0: [--lr-decay-samples LR_DECAY_SAMPLES] 0: [--lr-decay-tokens LR_DECAY_TOKENS] 0: [--lr-warmup-fraction LR_WARMUP_FRACTION] 0: [--lr-warmup-iters LR_WARMUP_ITERS] 0: [--lr-warmup-samples LR_WARMUP_SAMPLES] 0: [--warmup WARMUP] [--min-lr MIN_LR] 0: [--override-lr-scheduler] 0: [--use-checkpoint-lr-scheduler] 0: [--universal-checkpoint] [--save SAVE] 0: [--save-interval SAVE_INTERVAL] [--no-save-optim] 0: [--no-save-rng] [--load LOAD] [--no-load-optim] 0: [--no-load-rng] [--finetune] [--fp16] [--bf16] 0: [--loss-scale LOSS_SCALE] 0: [--initial-loss-scale INITIAL_LOSS_SCALE] 0: [--min-loss-scale MIN_LOSS_SCALE] 0: [--loss-scale-window LOSS_SCALE_WINDOW] 0: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 0: [--no-query-key-layer-scaling] 0: [--attention-softmax-in-fp32] 0: [--accumulate-allreduce-grads-in-fp32] 0: [--fp16-lm-cross-entropy] 0: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 0: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 0: [--model-parallel-size MODEL_PARALLEL_SIZE] 0: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 0: [--distributed-backend {nccl,gloo}] 0: [--DDP-impl {local,torch}] 0: [--use-contiguous-buffers-in-ddp] 0: [--no-scatter-gather-tensors-in-pipeline] 0: [--local_rank LOCAL_RANK] 0: [--lazy-mpu-init LAZY_MPU_INIT] 0: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 0: [--eval-interval EVAL_INTERVAL] 0: [--data-path [DATA_PATH ...]] [--split SPLIT] 0: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 0: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 0: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 0: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 0: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 0: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 0: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 0: [--merge-file MERGE_FILE] 0: [--vocab-extra-ids VOCAB_EXTRA_IDS] 0: [--seq-length SEQ_LENGTH] 0: [--encoder-seq-length ENCODER_SEQ_LENGTH] 0: [--decoder-seq-length DECODER_SEQ_LENGTH] 0: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 0: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 0: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 0: [--num-workers NUM_WORKERS] 0: [--valid-num-workers VALID_NUM_WORKERS] 0: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 0: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 0: [--data-impl {lazy,cached,mmap,infer}] 0: [--reset-position-ids] [--reset-attention-mask] 0: [--eod-mask-loss] [--loss-on-targets-only] 0: [--reweight-loss-based-on-position-frequency] 0: [--noise-density NOISE_DENSITY] 0: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 0: [--adlr-autoresume] 0: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 0: [--ict-head-size ICT_HEAD_SIZE] 0: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 0: [--biencoder-shared-query-context-model] 0: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 0: [--titles-data-path TITLES_DATA_PATH] 0: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 0: [--use-one-sent-docs] 0: [--evidence-data-path EVIDENCE_DATA_PATH] 0: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 0: [--retriever-score-scaling] 0: [--block-data-path BLOCK_DATA_PATH] 0: [--embedding-path EMBEDDING_PATH] 0: [--indexer-batch-size INDEXER_BATCH_SIZE] 0: [--indexer-log-interval INDEXER_LOG_INTERVAL] 0: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 0: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 0: [--log-params-norm] [--log-num-zeros-in-grad] 0: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 0: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 0: [--log-timers-to-tensorboard] 0: [--log-batch-size-to-tensorboard] 0: [--no-log-learnig-rate-to-tensorboard] 0: [--no-log-loss-scale-to-tensorboard] 0: [--log-validation-ppl-to-tensorboard] 0: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 0: [--zero-contigious-gradients] 0: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 0: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 0: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 0: [--scattered-embeddings] [--split-transformers] 0: [--memory-centric-tiled-linear] 0: [--tile-factor TILE_FACTOR] 0: [--deepspeed-activation-checkpointing] 0: [--partition-activations] [--contigious-checkpointing] 0: [--checkpoint-in-cpu] [--synchronize-each-layer] 0: [--profile-backward] [--deepspeed] 0: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 0: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 0: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 0: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 0: [--hidden-size HIDDEN_SIZE] 0: [--ffn-hidden-size FFN_HIDDEN_SIZE] 0: [--num-attention-heads NUM_ATTENTION_HEADS] 0: [--kv-channels KV_CHANNELS] 0: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 0: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 0: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 0: [--layernorm-epsilon LAYERNORM_EPSILON] 0: [--sync-tp-duplicated-parameters] 0: [--apply-residual-connection-post-layernorm] 0: [--embed-layernorm] [--openai-gelu] 0: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 0: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 0: [--glu-activation {geglu,liglu,reglu,swiglu}] 0: [--kill-switch-path KILL_SWITCH_PATH] 0: [--log-level {debug,info,warning,error,critical}] 0: [--log-level-replica {debug,info,warning,error,critical}] 0: [--attention-dropout ATTENTION_DROPOUT] 0: [--hidden-dropout HIDDEN_DROPOUT] 0: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 0: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 0: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 0: [--micro-batch-size MICRO_BATCH_SIZE] 0: [--batch-size BATCH_SIZE] 0: [--global-batch-size GLOBAL_BATCH_SIZE] 0: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 0: [--checkpoint-activations] 0: [--distribute-checkpointed-activations] 0: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 0: [--train-iters TRAIN_ITERS] 0: [--train-samples TRAIN_SAMPLES] 0: [--train-tokens TRAIN_TOKENS] 0: [--log-interval LOG_INTERVAL] 0: [--exit-interval EXIT_INTERVAL] 0: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 0: [--tensorboard-dir TENSORBOARD_DIR] 0: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 0: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 0: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 0: [--use-bnb-optimizer] 0: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 0: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 0: [--eval-only EVAL_ONLY] 0: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 0: [--inference] 0: [--abort-on-unmet-fused-kernel-constraints] 0: [--pp-partition-method PP_PARTITION_METHOD] 0: [--seed SEED] [--init-method-std INIT_METHOD_STD] 0: [--init-method-xavier-uniform] [--lr LR] 0: [--lr-decay-style {constant,linear,cosine}] 0: [--lr-decay-iters LR_DECAY_ITERS] 0: [--lr-decay-samples LR_DECAY_SAMPLES] 0: [--lr-decay-tokens LR_DECAY_TOKENS] 0: [--lr-warmup-fraction LR_WARMUP_FRACTION] 0: [--lr-warmup-iters LR_WARMUP_ITERS] 0: [--lr-warmup-samples LR_WARMUP_SAMPLES] 0: [--warmup WARMUP] [--min-lr MIN_LR] 0: [--override-lr-scheduler] 0: [--use-checkpoint-lr-scheduler] 0: [--universal-checkpoint] [--save SAVE] 0: [--save-interval SAVE_INTERVAL] [--no-save-optim] 0: [--no-save-rng] [--load LOAD] [--no-load-optim] 0: [--no-load-rng] [--finetune] [--fp16] [--bf16] 0: [--loss-scale LOSS_SCALE] 0: [--initial-loss-scale INITIAL_LOSS_SCALE] 0: [--min-loss-scale MIN_LOSS_SCALE] 0: [--loss-scale-window LOSS_SCALE_WINDOW] 0: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 0: [--no-query-key-layer-scaling] 0: [--attention-softmax-in-fp32] 0: [--accumulate-allreduce-grads-in-fp32] 0: [--fp16-lm-cross-entropy] 0: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 0: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 0: [--model-parallel-size MODEL_PARALLEL_SIZE] 0: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 0: [--distributed-backend {nccl,gloo}] 0: [--DDP-impl {local,torch}] 0: [--use-contiguous-buffers-in-ddp] 0: [--no-scatter-gather-tensors-in-pipeline] 0: [--local_rank LOCAL_RANK] 0: [--lazy-mpu-init LAZY_MPU_INIT] 0: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 0: [--eval-interval EVAL_INTERVAL] 0: [--data-path [DATA_PATH ...]] [--split SPLIT] 0: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 0: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 0: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 0: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 0: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 0: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 0: [--merge-file MERGE_FILE] 0: [--vocab-extra-ids VOCAB_EXTRA_IDS] 0: [--seq-length SEQ_LENGTH] 0: [--encoder-seq-length ENCODER_SEQ_LENGTH] 0: [--decoder-seq-length DECODER_SEQ_LENGTH] 0: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 0: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 0: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 0: [--num-workers NUM_WORKERS] 0: [--valid-num-workers VALID_NUM_WORKERS] 0: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 0: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 0: [--data-impl {lazy,cached,mmap,infer}] 0: [--reset-position-ids] [--reset-attention-mask] 0: [--eod-mask-loss] [--loss-on-targets-only] 0: [--reweight-loss-based-on-position-frequency] 0: [--noise-density NOISE_DENSITY] 0: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 0: [--adlr-autoresume] 0: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 0: [--ict-head-size ICT_HEAD_SIZE] 0: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 0: [--biencoder-shared-query-context-model] 0: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 0: [--titles-data-path TITLES_DATA_PATH] 0: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 0: [--use-one-sent-docs] 0: [--evidence-data-path EVIDENCE_DATA_PATH] 0: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 0: [--retriever-score-scaling] 0: [--block-data-path BLOCK_DATA_PATH] 0: [--embedding-path EMBEDDING_PATH] 0: [--indexer-batch-size INDEXER_BATCH_SIZE] 0: [--indexer-log-interval INDEXER_LOG_INTERVAL] 0: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 0: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 0: [--log-params-norm] [--log-num-zeros-in-grad] 0: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 0: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 0: [--log-timers-to-tensorboard] 0: [--log-batch-size-to-tensorboard] 0: [--no-log-learnig-rate-to-tensorboard] 0: [--no-log-loss-scale-to-tensorboard] 0: [--log-validation-ppl-to-tensorboard] 0: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 0: [--zero-contigious-gradients] 0: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 0: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 0: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 0: [--scattered-embeddings] [--split-transformers] 0: [--memory-centric-tiled-linear] 0: [--tile-factor TILE_FACTOR] 0: [--deepspeed-activation-checkpointing] 0: [--partition-activations] [--contigious-checkpointing] 0: [--checkpoint-in-cpu] [--synchronize-each-layer] 0: [--profile-backward] [--deepspeed] 0: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 0: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 0: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 7: [--hidden-size HIDDEN_SIZE] 7: [--ffn-hidden-size FFN_HIDDEN_SIZE] 7: [--num-attention-heads NUM_ATTENTION_HEADS] 7: [--kv-channels KV_CHANNELS] 7: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 7: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 7: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 7: [--layernorm-epsilon LAYERNORM_EPSILON] 7: [--sync-tp-duplicated-parameters] 7: [--apply-residual-connection-post-layernorm] 7: [--embed-layernorm] [--openai-gelu] 7: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 7: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 7: [--glu-activation {geglu,liglu,reglu,swiglu}] 7: [--kill-switch-path KILL_SWITCH_PATH] 7: [--log-level {debug,info,warning,error,critical}] 7: [--log-level-replica {debug,info,warning,error,critical}] 7: [--attention-dropout ATTENTION_DROPOUT] 7: [--hidden-dropout HIDDEN_DROPOUT] 7: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 7: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 7: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 7: [--micro-batch-size MICRO_BATCH_SIZE] 7: [--batch-size BATCH_SIZE] 7: [--global-batch-size GLOBAL_BATCH_SIZE] 7: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 7: [--checkpoint-activations] 7: [--distribute-checkpointed-activations] 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 7: [--train-iters TRAIN_ITERS] 7: [--train-samples TRAIN_SAMPLES] 7: [--train-tokens TRAIN_TOKENS] 7: [--log-interval LOG_INTERVAL] 7: [--exit-interval EXIT_INTERVAL] 7: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 7: [--tensorboard-dir TENSORBOARD_DIR] 7: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 7: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 7: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 7: [--use-bnb-optimizer] 7: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 7: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 7: [--eval-only EVAL_ONLY] 7: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 7: [--inference] 7: [--abort-on-unmet-fused-kernel-constraints] 7: [--pp-partition-method PP_PARTITION_METHOD] 7: [--seed SEED] [--init-method-std INIT_METHOD_STD] 7: [--init-method-xavier-uniform] [--lr LR] 7: [--lr-decay-style {constant,linear,cosine}] 7: [--lr-decay-iters LR_DECAY_ITERS] 7: [--lr-decay-samples LR_DECAY_SAMPLES] 7: [--lr-decay-tokens LR_DECAY_TOKENS] 7: [--lr-warmup-fraction LR_WARMUP_FRACTION] 7: [--lr-warmup-iters LR_WARMUP_ITERS] 7: [--lr-warmup-samples LR_WARMUP_SAMPLES] 7: [--warmup WARMUP] [--min-lr MIN_LR] 7: [--override-lr-scheduler] 7: [--use-checkpoint-lr-scheduler] 7: [--universal-checkpoint] [--save SAVE] 7: [--save-interval SAVE_INTERVAL] [--no-save-optim] 7: [--no-save-rng] [--load LOAD] [--no-load-optim] 7: [--no-load-rng] [--finetune] [--fp16] [--bf16] 7: [--loss-scale LOSS_SCALE] 7: [--initial-loss-scale INITIAL_LOSS_SCALE] 7: [--min-loss-scale MIN_LOSS_SCALE] 7: [--loss-scale-window LOSS_SCALE_WINDOW] 7: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 7: [--no-query-key-layer-scaling] 7: [--attention-softmax-in-fp32] 7: [--accumulate-allreduce-grads-in-fp32] 7: [--fp16-lm-cross-entropy] 7: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 7: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 7: [--model-parallel-size MODEL_PARALLEL_SIZE] 7: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 7: [--distributed-backend {nccl,gloo}] 7: [--DDP-impl {local,torch}] 7: [--use-contiguous-buffers-in-ddp] 7: [--no-scatter-gather-tensors-in-pipeline] 7: [--local_rank LOCAL_RANK] 7: [--lazy-mpu-init LAZY_MPU_INIT] 7: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 7: [--eval-interval EVAL_INTERVAL] 7: [--data-path [DATA_PATH ...]] [--split SPLIT] 7: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 7: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 7: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 7: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 7: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 7: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 7: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 7: [--merge-file MERGE_FILE] 7: [--vocab-extra-ids VOCAB_EXTRA_IDS] 7: [--seq-length SEQ_LENGTH] 7: [--encoder-seq-length ENCODER_SEQ_LENGTH] 7: [--decoder-seq-length DECODER_SEQ_LENGTH] 7: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 7: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 7: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 7: [--num-workers NUM_WORKERS] 7: [--valid-num-workers VALID_NUM_WORKERS] 7: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 7: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 7: [--data-impl {lazy,cached,mmap,infer}] 7: [--reset-position-ids] [--reset-attention-mask] 7: [--eod-mask-loss] [--loss-on-targets-only] 7: [--reweight-loss-based-on-position-frequency] 7: [--noise-density NOISE_DENSITY] 7: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 7: [--adlr-autoresume] 7: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 7: [--ict-head-size ICT_HEAD_SIZE] 7: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 7: [--biencoder-shared-query-context-model] 7: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 7: [--titles-data-path TITLES_DATA_PATH] 7: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 7: [--use-one-sent-docs] 7: [--evidence-data-path EVIDENCE_DATA_PATH] 7: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 7: [--retriever-score-scaling] 7: [--block-data-path BLOCK_DATA_PATH] 7: [--embedding-path EMBEDDING_PATH] 7: [--indexer-batch-size INDEXER_BATCH_SIZE] 7: [--indexer-log-interval INDEXER_LOG_INTERVAL] 7: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 7: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 7: [--log-params-norm] [--log-num-zeros-in-grad] 7: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 7: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 7: [--log-timers-to-tensorboard] 7: [--log-batch-size-to-tensorboard] 7: [--no-log-learnig-rate-to-tensorboard] 7: [--no-log-loss-scale-to-tensorboard] 7: [--log-validation-ppl-to-tensorboard] 7: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 7: [--zero-contigious-gradients] 7: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 7: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 7: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 7: [--scattered-embeddings] [--split-transformers] 7: [--memory-centric-tiled-linear] 7: [--tile-factor TILE_FACTOR] 7: [--deepspeed-activation-checkpointing] 7: [--partition-activations] [--contigious-checkpointing] 7: [--checkpoint-in-cpu] [--synchronize-each-layer] 7: [--profile-backward] [--deepspeed] 7: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 7: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 7: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 7: [--hidden-size HIDDEN_SIZE] 7: [--ffn-hidden-size FFN_HIDDEN_SIZE] 7: [--num-attention-heads NUM_ATTENTION_HEADS] 7: [--kv-channels KV_CHANNELS] 7: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 7: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 7: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 7: [--layernorm-epsilon LAYERNORM_EPSILON] 7: [--sync-tp-duplicated-parameters] 7: [--apply-residual-connection-post-layernorm] 7: [--embed-layernorm] [--openai-gelu] 7: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 7: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 7: [--glu-activation {geglu,liglu,reglu,swiglu}] 7: [--kill-switch-path KILL_SWITCH_PATH] 7: [--log-level {debug,info,warning,error,critical}] 7: [--log-level-replica {debug,info,warning,error,critical}] 7: [--attention-dropout ATTENTION_DROPOUT] 7: [--hidden-dropout HIDDEN_DROPOUT] 7: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 7: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 7: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 7: [--micro-batch-size MICRO_BATCH_SIZE] 7: [--batch-size BATCH_SIZE] 7: [--global-batch-size GLOBAL_BATCH_SIZE] 7: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 7: [--checkpoint-activations] 7: [--distribute-checkpointed-activations] 7: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 7: [--train-iters TRAIN_ITERS] 7: [--train-samples TRAIN_SAMPLES] 7: [--train-tokens TRAIN_TOKENS] 7: [--log-interval LOG_INTERVAL] 7: [--exit-interval EXIT_INTERVAL] 7: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 7: [--tensorboard-dir TENSORBOARD_DIR] 7: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 7: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 7: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 7: [--use-bnb-optimizer] 7: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 7: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 7: [--eval-only EVAL_ONLY] 7: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 7: [--inference] 7: [--abort-on-unmet-fused-kernel-constraints] 7: [--pp-partition-method PP_PARTITION_METHOD] 7: [--seed SEED] [--init-method-std INIT_METHOD_STD] 7: [--init-method-xavier-uniform] [--lr LR] 7: [--lr-decay-style {constant,linear,cosine}] 7: [--lr-decay-iters LR_DECAY_ITERS] 7: [--lr-decay-samples LR_DECAY_SAMPLES] 7: [--lr-decay-tokens LR_DECAY_TOKENS] 7: [--lr-warmup-fraction LR_WARMUP_FRACTION] 7: [--lr-warmup-iters LR_WARMUP_ITERS] 7: [--lr-warmup-samples LR_WARMUP_SAMPLES] 7: [--warmup WARMUP] [--min-lr MIN_LR] 7: [--override-lr-scheduler] 7: [--use-checkpoint-lr-scheduler] 7: [--universal-checkpoint] [--save SAVE] 7: [--save-interval SAVE_INTERVAL] [--no-save-optim] 7: [--no-save-rng] [--load LOAD] [--no-load-optim] 7: [--no-load-rng] [--finetune] [--fp16] [--bf16] 7: [--loss-scale LOSS_SCALE] 7: [--initial-loss-scale INITIAL_LOSS_SCALE] 7: [--min-loss-scale MIN_LOSS_SCALE] 7: [--loss-scale-window LOSS_SCALE_WINDOW] 7: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 7: [--no-query-key-layer-scaling] 7: [--attention-softmax-in-fp32] 7: [--accumulate-allreduce-grads-in-fp32] 7: [--fp16-lm-cross-entropy] 7: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 7: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 7: [--model-parallel-size MODEL_PARALLEL_SIZE] 7: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 7: [--distributed-backend {nccl,gloo}] 7: [--DDP-impl {local,torch}] 7: [--use-contiguous-buffers-in-ddp] 7: [--no-scatter-gather-tensors-in-pipeline] 7: [--local_rank LOCAL_RANK] 7: [--lazy-mpu-init LAZY_MPU_INIT] 7: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 7: [--eval-interval EVAL_INTERVAL] 7: [--data-path [DATA_PATH ...]] [--split SPLIT] 7: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 7: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 7: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 7: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 7: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 7: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 7: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 7: [--merge-file MERGE_FILE] 7: [--vocab-extra-ids VOCAB_EXTRA_IDS] 7: [--seq-length SEQ_LENGTH] 7: [--encoder-seq-length ENCODER_SEQ_LENGTH] 7: [--decoder-seq-length DECODER_SEQ_LENGTH] 7: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 7: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 7: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 7: [--num-workers NUM_WORKERS] 7: [--valid-num-workers VALID_NUM_WORKERS] 7: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 7: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 7: [--data-impl {lazy,cached,mmap,infer}] 7: [--reset-position-ids] [--reset-attention-mask] 7: [--eod-mask-loss] [--loss-on-targets-only] 7: [--reweight-loss-based-on-position-frequency] 7: [--noise-density NOISE_DENSITY] 7: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 7: [--adlr-autoresume] 7: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 7: [--ict-head-size ICT_HEAD_SIZE] 7: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 7: [--biencoder-shared-query-context-model] 7: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 7: [--titles-data-path TITLES_DATA_PATH] 7: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 7: [--use-one-sent-docs] 7: [--evidence-data-path EVIDENCE_DATA_PATH] 7: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 7: [--retriever-score-scaling] 7: [--block-data-path BLOCK_DATA_PATH] 7: [--embedding-path EMBEDDING_PATH] 7: [--indexer-batch-size INDEXER_BATCH_SIZE] 7: [--indexer-log-interval INDEXER_LOG_INTERVAL] 7: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 7: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 7: [--log-params-norm] [--log-num-zeros-in-grad] 7: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 7: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 7: [--log-timers-to-tensorboard] 7: [--log-batch-size-to-tensorboard] 7: [--no-log-learnig-rate-to-tensorboard] 7: [--no-log-loss-scale-to-tensorboard] 7: [--log-validation-ppl-to-tensorboard] 7: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 7: [--zero-contigious-gradients] 7: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 7: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 7: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 7: [--scattered-embeddings] [--split-transformers] 7: [--memory-centric-tiled-linear] 7: [--tile-factor TILE_FACTOR] 7: [--deepspeed-activation-checkpointing] 7: [--partition-activations] [--contigious-checkpointing] 7: [--checkpoint-in-cpu] [--synchronize-each-layer] 7: [--profile-backward] [--deepspeed] 7: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 7: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 7: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 7: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 7: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 7: [--hidden-size HIDDEN_SIZE] 7: [--ffn-hidden-size FFN_HIDDEN_SIZE] 7: [--num-attention-heads NUM_ATTENTION_HEADS] 7: [--kv-channels KV_CHANNELS] 7: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 7: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 7: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 7: [--layernorm-epsilon LAYERNORM_EPSILON] 7: [--sync-tp-duplicated-parameters] 7: [--apply-residual-connection-post-layernorm] 7: [--embed-layernorm] [--openai-gelu] 7: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 7: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 7: [--glu-activation {geglu,liglu,reglu,swiglu}] 7: [--kill-switch-path KILL_SWITCH_PATH] 7: [--log-level {debug,info,warning,error,critical}] 7: [--log-level-replica {debug,info,warning,error,critical}] 7: [--attention-dropout ATTENTION_DROPOUT] 7: [--hidden-dropout HIDDEN_DROPOUT] 7: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 7: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 7: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 7: [--micro-batch-size MICRO_BATCH_SIZE] 7: [--batch-size BATCH_SIZE] 7: [--global-batch-size GLOBAL_BATCH_SIZE] 7: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 7: [--checkpoint-activations] 7: [--distribute-checkpointed-activations] 7: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 7: [--train-iters TRAIN_ITERS] 7: [--train-samples TRAIN_SAMPLES] 7: [--train-tokens TRAIN_TOKENS] 7: [--log-interval LOG_INTERVAL] 7: [--exit-interval EXIT_INTERVAL] 7: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 7: [--tensorboard-dir TENSORBOARD_DIR] 7: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 7: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 7: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 7: [--use-bnb-optimizer] 7: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 7: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 7: [--eval-only EVAL_ONLY] 7: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 7: [--inference] 7: [--abort-on-unmet-fused-kernel-constraints] 7: [--pp-partition-method PP_PARTITION_METHOD] 7: [--seed SEED] [--init-method-std INIT_METHOD_STD] 7: [--init-method-xavier-uniform] [--lr LR] 7: [--lr-decay-style {constant,linear,cosine}] 7: [--lr-decay-iters LR_DECAY_ITERS] 7: [--lr-decay-samples LR_DECAY_SAMPLES] 7: [--lr-decay-tokens LR_DECAY_TOKENS] 7: [--lr-warmup-fraction LR_WARMUP_FRACTION] 7: [--lr-warmup-iters LR_WARMUP_ITERS] 7: [--lr-warmup-samples LR_WARMUP_SAMPLES] 7: [--warmup WARMUP] [--min-lr MIN_LR] 7: [--override-lr-scheduler] 7: [--use-checkpoint-lr-scheduler] 7: [--universal-checkpoint] [--save SAVE] 7: [--save-interval SAVE_INTERVAL] [--no-save-optim] 7: [--no-save-rng] [--load LOAD] [--no-load-optim] 7: [--no-load-rng] [--finetune] [--fp16] [--bf16] 7: [--loss-scale LOSS_SCALE] 7: [--initial-loss-scale INITIAL_LOSS_SCALE] 7: [--min-loss-scale MIN_LOSS_SCALE] 7: [--loss-scale-window LOSS_SCALE_WINDOW] 7: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 7: [--no-query-key-layer-scaling] 7: [--attention-softmax-in-fp32] 7: [--accumulate-allreduce-grads-in-fp32] 7: [--fp16-lm-cross-entropy] 7: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 7: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 7: [--model-parallel-size MODEL_PARALLEL_SIZE] 7: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 7: [--distributed-backend {nccl,gloo}] 7: [--DDP-impl {local,torch}] 7: [--use-contiguous-buffers-in-ddp] 7: [--no-scatter-gather-tensors-in-pipeline] 7: [--local_rank LOCAL_RANK] 7: [--lazy-mpu-init LAZY_MPU_INIT] 7: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 7: [--eval-interval EVAL_INTERVAL] 7: [--data-path [DATA_PATH ...]] [--split SPLIT] 7: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 7: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 7: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 7: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 7: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 7: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 7: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 7: [--merge-file MERGE_FILE] 7: [--vocab-extra-ids VOCAB_EXTRA_IDS] 7: [--seq-length SEQ_LENGTH] 7: [--encoder-seq-length ENCODER_SEQ_LENGTH] 7: [--decoder-seq-length DECODER_SEQ_LENGTH] 7: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 7: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 7: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 7: [--num-workers NUM_WORKERS] 7: [--valid-num-workers VALID_NUM_WORKERS] 7: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 7: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 7: [--data-impl {lazy,cached,mmap,infer}] 7: [--reset-position-ids] [--reset-attention-mask] 7: [--eod-mask-loss] [--loss-on-targets-only] 7: [--reweight-loss-based-on-position-frequency] 7: [--noise-density NOISE_DENSITY] 7: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 7: [--adlr-autoresume] 7: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 7: [--ict-head-size ICT_HEAD_SIZE] 7: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 7: [--biencoder-shared-query-context-model] 7: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 7: [--titles-data-path TITLES_DATA_PATH] 7: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 7: [--use-one-sent-docs] 7: [--evidence-data-path EVIDENCE_DATA_PATH] 7: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 7: [--retriever-score-scaling] 7: [--block-data-path BLOCK_DATA_PATH] 7: [--embedding-path EMBEDDING_PATH] 7: [--indexer-batch-size INDEXER_BATCH_SIZE] 7: [--indexer-log-interval INDEXER_LOG_INTERVAL] 7: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 7: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 7: [--log-params-norm] [--log-num-zeros-in-grad] 7: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 7: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 7: [--log-timers-to-tensorboard] 7: [--log-batch-size-to-tensorboard] 7: [--no-log-learnig-rate-to-tensorboard] 7: [--no-log-loss-scale-to-tensorboard] 7: [--log-validation-ppl-to-tensorboard] 7: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 7: [--zero-contigious-gradients] 7: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 7: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 7: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 7: [--scattered-embeddings] [--split-transformers] 7: [--memory-centric-tiled-linear] 7: [--tile-factor TILE_FACTOR] 7: [--deepspeed-activation-checkpointing] 7: [--partition-activations] [--contigious-checkpointing] 7: [--checkpoint-in-cpu] [--synchronize-each-layer] 7: [--profile-backward] [--deepspeed] 7: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 7: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 7: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 7: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 7: [--hidden-size HIDDEN_SIZE] 7: [--ffn-hidden-size FFN_HIDDEN_SIZE] 7: [--num-attention-heads NUM_ATTENTION_HEADS] 7: [--kv-channels KV_CHANNELS] 7: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 7: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 7: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 7: [--layernorm-epsilon LAYERNORM_EPSILON] 7: [--sync-tp-duplicated-parameters] 7: [--apply-residual-connection-post-layernorm] 7: [--embed-layernorm] [--openai-gelu] 7: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 7: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 7: [--glu-activation {geglu,liglu,reglu,swiglu}] 7: [--kill-switch-path KILL_SWITCH_PATH] 7: [--log-level {debug,info,warning,error,critical}] 7: [--log-level-replica {debug,info,warning,error,critical}] 7: [--attention-dropout ATTENTION_DROPOUT] 7: [--hidden-dropout HIDDEN_DROPOUT] 7: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 7: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 7: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 7: [--micro-batch-size MICRO_BATCH_SIZE] 7: [--batch-size BATCH_SIZE] 7: [--global-batch-size GLOBAL_BATCH_SIZE] 7: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 7: [--checkpoint-activations] 7: [--distribute-checkpointed-activations] 7: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 7: [--train-iters TRAIN_ITERS] 7: [--train-samples TRAIN_SAMPLES] 7: [--train-tokens TRAIN_TOKENS] 7: [--log-interval LOG_INTERVAL] 7: [--exit-interval EXIT_INTERVAL] 7: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 7: [--tensorboard-dir TENSORBOARD_DIR] 7: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 7: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 7: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 7: [--use-bnb-optimizer] 7: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 7: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 7: [--eval-only EVAL_ONLY] 7: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 7: [--inference] 7: [--abort-on-unmet-fused-kernel-constraints] 7: [--pp-partition-method PP_PARTITION_METHOD] 7: [--seed SEED] [--init-method-std INIT_METHOD_STD] 7: [--init-method-xavier-uniform] [--lr LR] 7: [--lr-decay-style {constant,linear,cosine}] 7: [--lr-decay-iters LR_DECAY_ITERS] 7: [--lr-decay-samples LR_DECAY_SAMPLES] 7: [--lr-decay-tokens LR_DECAY_TOKENS] 7: [--lr-warmup-fraction LR_WARMUP_FRACTION] 7: [--lr-warmup-iters LR_WARMUP_ITERS] 7: [--lr-warmup-samples LR_WARMUP_SAMPLES] 7: [--warmup WARMUP] [--min-lr MIN_LR] 7: [--override-lr-scheduler] 7: [--use-checkpoint-lr-scheduler] 7: [--universal-checkpoint] [--save SAVE] 7: [--save-interval SAVE_INTERVAL] [--no-save-optim] 7: [--no-save-rng] [--load LOAD] [--no-load-optim] 7: [--no-load-rng] [--finetune] [--fp16] [--bf16] 7: [--loss-scale LOSS_SCALE] 7: [--initial-loss-scale INITIAL_LOSS_SCALE] 7: [--min-loss-scale MIN_LOSS_SCALE] 7: [--loss-scale-window LOSS_SCALE_WINDOW] 7: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 7: [--no-query-key-layer-scaling] 7: [--attention-softmax-in-fp32] 7: [--accumulate-allreduce-grads-in-fp32] 7: [--fp16-lm-cross-entropy] 7: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 7: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 7: [--model-parallel-size MODEL_PARALLEL_SIZE] 7: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 7: [--distributed-backend {nccl,gloo}] 7: [--DDP-impl {local,torch}] 7: [--use-contiguous-buffers-in-ddp] 7: [--no-scatter-gather-tensors-in-pipeline] 7: [--local_rank LOCAL_RANK] 7: [--lazy-mpu-init LAZY_MPU_INIT] 7: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 7: [--eval-interval EVAL_INTERVAL] 7: [--data-path [DATA_PATH ...]] [--split SPLIT] 7: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 7: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 7: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 7: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 7: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 7: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 7: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 7: [--merge-file MERGE_FILE] 7: [--vocab-extra-ids VOCAB_EXTRA_IDS] 7: [--seq-length SEQ_LENGTH] 7: [--encoder-seq-length ENCODER_SEQ_LENGTH] 7: [--decoder-seq-length DECODER_SEQ_LENGTH] 7: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 7: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 7: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 7: [--num-workers NUM_WORKERS] 7: [--valid-num-workers VALID_NUM_WORKERS] 7: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 7: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 7: [--data-impl {lazy,cached,mmap,infer}] 7: [--reset-position-ids] [--reset-attention-mask] 7: [--eod-mask-loss] [--loss-on-targets-only] 7: [--reweight-loss-based-on-position-frequency] 7: [--noise-density NOISE_DENSITY] 7: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 7: [--adlr-autoresume] 7: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 7: [--ict-head-size ICT_HEAD_SIZE] 7: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 7: [--biencoder-shared-query-context-model] 7: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 7: [--titles-data-path TITLES_DATA_PATH] 7: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 7: [--use-one-sent-docs] 7: [--evidence-data-path EVIDENCE_DATA_PATH] 7: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 7: [--retriever-score-scaling] 7: [--block-data-path BLOCK_DATA_PATH] 7: [--embedding-path EMBEDDING_PATH] 7: [--indexer-batch-size INDEXER_BATCH_SIZE] 7: [--indexer-log-interval INDEXER_LOG_INTERVAL] 7: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 7: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 7: [--log-params-norm] [--log-num-zeros-in-grad] 7: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 7: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 7: [--log-timers-to-tensorboard] 7: [--log-batch-size-to-tensorboard] 7: [--no-log-learnig-rate-to-tensorboard] 7: [--no-log-loss-scale-to-tensorboard] 7: [--log-validation-ppl-to-tensorboard] 7: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 7: [--zero-contigious-gradients] 7: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 7: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 7: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 7: [--scattered-embeddings] [--split-transformers] 7: [--memory-centric-tiled-linear] 7: [--tile-factor TILE_FACTOR] 7: [--deepspeed-activation-checkpointing] 7: [--partition-activations] [--contigious-checkpointing] 7: [--checkpoint-in-cpu] [--synchronize-each-layer] 7: [--profile-backward] [--deepspeed] 7: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 7: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 7: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 7: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 7: [--hidden-size HIDDEN_SIZE] 7: [--ffn-hidden-size FFN_HIDDEN_SIZE] 7: [--num-attention-heads NUM_ATTENTION_HEADS] 7: [--kv-channels KV_CHANNELS] 7: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 7: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 7: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 7: [--layernorm-epsilon LAYERNORM_EPSILON] 7: [--sync-tp-duplicated-parameters] 7: [--apply-residual-connection-post-layernorm] 7: [--embed-layernorm] [--openai-gelu] 7: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 7: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 7: [--glu-activation {geglu,liglu,reglu,swiglu}] 7: [--kill-switch-path KILL_SWITCH_PATH] 7: [--log-level {debug,info,warning,error,critical}] 7: [--log-level-replica {debug,info,warning,error,critical}] 7: [--attention-dropout ATTENTION_DROPOUT] 7: [--hidden-dropout HIDDEN_DROPOUT] 7: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 7: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 7: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 7: [--micro-batch-size MICRO_BATCH_SIZE] 7: [--batch-size BATCH_SIZE] 7: [--global-batch-size GLOBAL_BATCH_SIZE] 7: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 7: [--checkpoint-activations] 7: [--distribute-checkpointed-activations] 7: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 7: [--train-iters TRAIN_ITERS] 7: [--train-samples TRAIN_SAMPLES] 7: [--train-tokens TRAIN_TOKENS] 7: [--log-interval LOG_INTERVAL] 7: [--exit-interval EXIT_INTERVAL] 7: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 7: [--tensorboard-dir TENSORBOARD_DIR] 7: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 7: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 7: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 7: [--use-bnb-optimizer] 7: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 7: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 7: [--eval-only EVAL_ONLY] 7: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 7: [--inference] 7: [--abort-on-unmet-fused-kernel-constraints] 7: [--pp-partition-method PP_PARTITION_METHOD] 7: [--seed SEED] [--init-method-std INIT_METHOD_STD] 7: [--init-method-xavier-uniform] [--lr LR] 7: [--lr-decay-style {constant,linear,cosine}] 7: [--lr-decay-iters LR_DECAY_ITERS] 7: [--lr-decay-samples LR_DECAY_SAMPLES] 7: [--lr-decay-tokens LR_DECAY_TOKENS] 7: [--lr-warmup-fraction LR_WARMUP_FRACTION] 7: [--lr-warmup-iters LR_WARMUP_ITERS] 7: [--lr-warmup-samples LR_WARMUP_SAMPLES] 7: [--warmup WARMUP] [--min-lr MIN_LR] 7: [--override-lr-scheduler] 7: [--use-checkpoint-lr-scheduler] 7: [--universal-checkpoint] [--save SAVE] 7: [--save-interval SAVE_INTERVAL] [--no-save-optim] 7: [--no-save-rng] [--load LOAD] [--no-load-optim] 7: [--no-load-rng] [--finetune] [--fp16] [--bf16] 7: [--loss-scale LOSS_SCALE] 7: [--initial-loss-scale INITIAL_LOSS_SCALE] 7: [--min-loss-scale MIN_LOSS_SCALE] 7: [--loss-scale-window LOSS_SCALE_WINDOW] 7: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 7: [--no-query-key-layer-scaling] 7: [--attention-softmax-in-fp32] 7: [--accumulate-allreduce-grads-in-fp32] 7: [--fp16-lm-cross-entropy] 7: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 7: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 7: [--model-parallel-size MODEL_PARALLEL_SIZE] 7: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 7: [--distributed-backend {nccl,gloo}] 7: [--DDP-impl {local,torch}] 7: [--use-contiguous-buffers-in-ddp] 7: [--no-scatter-gather-tensors-in-pipeline] 7: [--local_rank LOCAL_RANK] 7: [--lazy-mpu-init LAZY_MPU_INIT] 7: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 7: [--eval-interval EVAL_INTERVAL] 7: [--data-path [DATA_PATH ...]] [--split SPLIT] 7: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 7: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 7: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 7: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 7: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 7: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 7: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 7: [--merge-file MERGE_FILE] 7: [--vocab-extra-ids VOCAB_EXTRA_IDS] 7: [--seq-length SEQ_LENGTH] 7: [--encoder-seq-length ENCODER_SEQ_LENGTH] 7: [--decoder-seq-length DECODER_SEQ_LENGTH] 7: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 7: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 7: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 7: [--num-workers NUM_WORKERS] 7: [--valid-num-workers VALID_NUM_WORKERS] 7: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 7: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 7: [--data-impl {lazy,cached,mmap,infer}] 7: [--reset-position-ids] [--reset-attention-mask] 7: [--eod-mask-loss] [--loss-on-targets-only] 7: [--reweight-loss-based-on-position-frequency] 7: [--noise-density NOISE_DENSITY] 7: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 7: [--adlr-autoresume] 7: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 7: [--ict-head-size ICT_HEAD_SIZE] 7: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 7: [--biencoder-shared-query-context-model] 7: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 7: [--titles-data-path TITLES_DATA_PATH] 7: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 7: [--use-one-sent-docs] 7: [--evidence-data-path EVIDENCE_DATA_PATH] 7: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 7: [--retriever-score-scaling] 7: [--block-data-path BLOCK_DATA_PATH] 7: [--embedding-path EMBEDDING_PATH] 7: [--indexer-batch-size INDEXER_BATCH_SIZE] 7: [--indexer-log-interval INDEXER_LOG_INTERVAL] 7: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 7: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 7: [--log-params-norm] [--log-num-zeros-in-grad] 7: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 7: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 7: [--log-timers-to-tensorboard] 7: [--log-batch-size-to-tensorboard] 7: [--no-log-learnig-rate-to-tensorboard] 7: [--no-log-loss-scale-to-tensorboard] 7: [--log-validation-ppl-to-tensorboard] 7: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 7: [--zero-contigious-gradients] 7: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 7: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 7: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 7: [--scattered-embeddings] [--split-transformers] 7: [--memory-centric-tiled-linear] 7: [--tile-factor TILE_FACTOR] 7: [--deepspeed-activation-checkpointing] 7: [--partition-activations] [--contigious-checkpointing] 7: [--checkpoint-in-cpu] [--synchronize-each-layer] 7: [--profile-backward] [--deepspeed] 7: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 7: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 7: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 0: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 0: [--hidden-size HIDDEN_SIZE] 0: [--ffn-hidden-size FFN_HIDDEN_SIZE] 0: [--num-attention-heads NUM_ATTENTION_HEADS] 0: [--kv-channels KV_CHANNELS] 0: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 0: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 0: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 0: [--layernorm-epsilon LAYERNORM_EPSILON] 0: [--sync-tp-duplicated-parameters] 0: [--apply-residual-connection-post-layernorm] 0: [--embed-layernorm] [--openai-gelu] 0: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 0: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 0: [--glu-activation {geglu,liglu,reglu,swiglu}] 0: [--kill-switch-path KILL_SWITCH_PATH] 0: [--log-level {debug,info,warning,error,critical}] 0: [--log-level-replica {debug,info,warning,error,critical}] 0: [--attention-dropout ATTENTION_DROPOUT] 0: [--hidden-dropout HIDDEN_DROPOUT] 0: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 0: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 0: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 0: [--micro-batch-size MICRO_BATCH_SIZE] 0: [--batch-size BATCH_SIZE] 0: [--global-batch-size GLOBAL_BATCH_SIZE] 0: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 0: [--checkpoint-activations] 0: [--distribute-checkpointed-activations] 0: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 0: [--train-iters TRAIN_ITERS] 0: [--train-samples TRAIN_SAMPLES] 0: [--train-tokens TRAIN_TOKENS] 0: [--log-interval LOG_INTERVAL] 0: [--exit-interval EXIT_INTERVAL] 0: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 0: [--tensorboard-dir TENSORBOARD_DIR] 0: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 0: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 0: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 0: [--use-bnb-optimizer] 0: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 0: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 0: [--eval-only EVAL_ONLY] 0: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 0: [--inference] 0: [--abort-on-unmet-fused-kernel-constraints] 0: [--pp-partition-method PP_PARTITION_METHOD] 0: [--seed SEED] [--init-method-std INIT_METHOD_STD] 0: [--init-method-xavier-uniform] [--lr LR] 0: [--lr-decay-style {constant,linear,cosine}] 0: [--lr-decay-iters LR_DECAY_ITERS] 0: [--lr-decay-samples LR_DECAY_SAMPLES] 0: [--lr-decay-tokens LR_DECAY_TOKENS] 0: [--lr-warmup-fraction LR_WARMUP_FRACTION] 0: [--lr-warmup-iters LR_WARMUP_ITERS] 0: [--lr-warmup-samples LR_WARMUP_SAMPLES] 0: [--warmup WARMUP] [--min-lr MIN_LR] 0: [--override-lr-scheduler] 0: [--use-checkpoint-lr-scheduler] 0: [--universal-checkpoint] [--save SAVE] 0: [--save-interval SAVE_INTERVAL] [--no-save-optim] 0: [--no-save-rng] [--load LOAD] [--no-load-optim] 0: [--no-load-rng] [--finetune] [--fp16] [--bf16] 0: [--loss-scale LOSS_SCALE] 0: [--initial-loss-scale INITIAL_LOSS_SCALE] 0: [--min-loss-scale MIN_LOSS_SCALE] 0: [--loss-scale-window LOSS_SCALE_WINDOW] 0: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 0: [--no-query-key-layer-scaling] 0: [--attention-softmax-in-fp32] 0: [--accumulate-allreduce-grads-in-fp32] 0: [--fp16-lm-cross-entropy] 0: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 0: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 0: [--model-parallel-size MODEL_PARALLEL_SIZE] 0: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 0: [--distributed-backend {nccl,gloo}] 0: [--DDP-impl {local,torch}] 0: [--use-contiguous-buffers-in-ddp] 0: [--no-scatter-gather-tensors-in-pipeline] 0: [--local_rank LOCAL_RANK] 0: [--lazy-mpu-init LAZY_MPU_INIT] 0: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 0: [--eval-interval EVAL_INTERVAL] 0: [--data-path [DATA_PATH ...]] [--split SPLIT] 0: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 0: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 0: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 0: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 0: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 0: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 0: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 0: [--merge-file MERGE_FILE] 0: [--vocab-extra-ids VOCAB_EXTRA_IDS] 0: [--seq-length SEQ_LENGTH] 0: [--encoder-seq-length ENCODER_SEQ_LENGTH] 0: [--decoder-seq-length DECODER_SEQ_LENGTH] 0: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 0: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 0: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 0: [--num-workers NUM_WORKERS] 0: [--valid-num-workers VALID_NUM_WORKERS] 0: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 0: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 0: [--data-impl {lazy,cached,mmap,infer}] 0: [--reset-position-ids] [--reset-attention-mask] 0: [--eod-mask-loss] [--loss-on-targets-only] 0: [--reweight-loss-based-on-position-frequency] 0: [--noise-density NOISE_DENSITY] 0: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 0: [--adlr-autoresume] 0: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 0: [--ict-head-size ICT_HEAD_SIZE] 0: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 0: [--biencoder-shared-query-context-model] 0: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 0: [--titles-data-path TITLES_DATA_PATH] 0: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 0: [--use-one-sent-docs] 0: [--evidence-data-path EVIDENCE_DATA_PATH] 0: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 0: [--retriever-score-scaling] 0: [--block-data-path BLOCK_DATA_PATH] 0: [--embedding-path EMBEDDING_PATH] 0: [--indexer-batch-size INDEXER_BATCH_SIZE] 0: [--indexer-log-interval INDEXER_LOG_INTERVAL] 0: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 0: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 0: [--log-params-norm] [--log-num-zeros-in-grad] 0: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 0: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 0: [--log-timers-to-tensorboard] 0: [--log-batch-size-to-tensorboard] 0: [--no-log-learnig-rate-to-tensorboard] 0: [--no-log-loss-scale-to-tensorboard] 0: [--log-validation-ppl-to-tensorboard] 0: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 0: [--zero-contigious-gradients] 0: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 0: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 0: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 0: [--scattered-embeddings] [--split-transformers] 0: [--memory-centric-tiled-linear] 0: [--tile-factor TILE_FACTOR] 0: [--deepspeed-activation-checkpointing] 0: [--partition-activations] [--contigious-checkpointing] 0: [--checkpoint-in-cpu] [--synchronize-each-layer] 0: [--profile-backward] [--deepspeed] 0: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 0: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 0: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 1: [--hidden-size HIDDEN_SIZE] 1: [--ffn-hidden-size FFN_HIDDEN_SIZE] 1: [--num-attention-heads NUM_ATTENTION_HEADS] 1: [--kv-channels KV_CHANNELS] 1: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 1: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 1: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 1: [--layernorm-epsilon LAYERNORM_EPSILON] 1: [--sync-tp-duplicated-parameters] 1: [--apply-residual-connection-post-layernorm] 1: [--embed-layernorm] [--openai-gelu] 1: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 1: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 1: [--glu-activation {geglu,liglu,reglu,swiglu}] 1: [--kill-switch-path KILL_SWITCH_PATH] 1: [--log-level {debug,info,warning,error,critical}] 1: [--log-level-replica {debug,info,warning,error,critical}] 1: [--attention-dropout ATTENTION_DROPOUT] 1: [--hidden-dropout HIDDEN_DROPOUT] 1: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 1: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 1: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 1: [--micro-batch-size MICRO_BATCH_SIZE] 1: [--batch-size BATCH_SIZE] 1: [--global-batch-size GLOBAL_BATCH_SIZE] 1: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 1: [--checkpoint-activations] 1: [--distribute-checkpointed-activations] 1: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 1: [--train-iters TRAIN_ITERS] 1: [--train-samples TRAIN_SAMPLES] 1: [--train-tokens TRAIN_TOKENS] 1: [--log-interval LOG_INTERVAL] 1: [--exit-interval EXIT_INTERVAL] 1: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 1: [--tensorboard-dir TENSORBOARD_DIR] 1: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 1: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 1: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 1: [--use-bnb-optimizer] 1: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 1: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 1: [--eval-only EVAL_ONLY] 1: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 1: [--inference] 1: [--abort-on-unmet-fused-kernel-constraints] 1: [--pp-partition-method PP_PARTITION_METHOD] 1: [--seed SEED] [--init-method-std INIT_METHOD_STD] 1: [--init-method-xavier-uniform] [--lr LR] 1: [--lr-decay-style {constant,linear,cosine}] 1: [--lr-decay-iters LR_DECAY_ITERS] 1: [--lr-decay-samples LR_DECAY_SAMPLES] 1: [--lr-decay-tokens LR_DECAY_TOKENS] 1: [--lr-warmup-fraction LR_WARMUP_FRACTION] 1: [--lr-warmup-iters LR_WARMUP_ITERS] 1: [--lr-warmup-samples LR_WARMUP_SAMPLES] 1: [--warmup WARMUP] [--min-lr MIN_LR] 1: [--override-lr-scheduler] 1: [--use-checkpoint-lr-scheduler] 1: [--universal-checkpoint] [--save SAVE] 1: [--save-interval SAVE_INTERVAL] [--no-save-optim] 1: [--no-save-rng] [--load LOAD] [--no-load-optim] 1: [--no-load-rng] [--finetune] [--fp16] [--bf16] 1: [--loss-scale LOSS_SCALE] 1: [--initial-loss-scale INITIAL_LOSS_SCALE] 1: [--min-loss-scale MIN_LOSS_SCALE] 1: [--loss-scale-window LOSS_SCALE_WINDOW] 1: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 1: [--no-query-key-layer-scaling] 1: [--attention-softmax-in-fp32] 1: [--accumulate-allreduce-grads-in-fp32] 1: [--fp16-lm-cross-entropy] 1: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 1: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 1: [--model-parallel-size MODEL_PARALLEL_SIZE] 1: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 1: [--distributed-backend {nccl,gloo}] 1: [--DDP-impl {local,torch}] 1: [--use-contiguous-buffers-in-ddp] 1: [--no-scatter-gather-tensors-in-pipeline] 1: [--local_rank LOCAL_RANK] 1: [--lazy-mpu-init LAZY_MPU_INIT] 1: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 1: [--eval-interval EVAL_INTERVAL] 1: [--data-path [DATA_PATH ...]] [--split SPLIT] 1: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 1: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 1: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 1: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 1: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 1: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 1: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 1: [--merge-file MERGE_FILE] 1: [--vocab-extra-ids VOCAB_EXTRA_IDS] 1: [--seq-length SEQ_LENGTH] 1: [--encoder-seq-length ENCODER_SEQ_LENGTH] 1: [--decoder-seq-length DECODER_SEQ_LENGTH] 1: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 1: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 1: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 1: [--num-workers NUM_WORKERS] 1: [--valid-num-workers VALID_NUM_WORKERS] 1: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 1: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 1: [--data-impl {lazy,cached,mmap,infer}] 1: [--reset-position-ids] [--reset-attention-mask] 1: [--eod-mask-loss] [--loss-on-targets-only] 1: [--reweight-loss-based-on-position-frequency] 1: [--noise-density NOISE_DENSITY] 1: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 1: [--adlr-autoresume] 1: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 1: [--ict-head-size ICT_HEAD_SIZE] 1: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 1: [--biencoder-shared-query-context-model] 1: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 1: [--titles-data-path TITLES_DATA_PATH] 1: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 1: [--use-one-sent-docs] 1: [--evidence-data-path EVIDENCE_DATA_PATH] 1: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 1: [--retriever-score-scaling] 1: [--block-data-path BLOCK_DATA_PATH] 1: [--embedding-path EMBEDDING_PATH] 1: [--indexer-batch-size INDEXER_BATCH_SIZE] 1: [--indexer-log-interval INDEXER_LOG_INTERVAL] 1: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 1: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 1: [--log-params-norm] [--log-num-zeros-in-grad] 1: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 1: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 1: [--log-timers-to-tensorboard] 1: [--log-batch-size-to-tensorboard] 1: [--no-log-learnig-rate-to-tensorboard] 1: [--no-log-loss-scale-to-tensorboard] 1: [--log-validation-ppl-to-tensorboard] 1: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 1: [--zero-contigious-gradients] 1: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 1: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 1: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 1: [--scattered-embeddings] [--split-transformers] 1: [--memory-centric-tiled-linear] 1: [--tile-factor TILE_FACTOR] 1: [--deepspeed-activation-checkpointing] 1: [--partition-activations] [--contigious-checkpointing] 1: [--checkpoint-in-cpu] [--synchronize-each-layer] 1: [--profile-backward] [--deepspeed] 1: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 1: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 1: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 1: [--hidden-size HIDDEN_SIZE] 1: [--ffn-hidden-size FFN_HIDDEN_SIZE] 1: [--num-attention-heads NUM_ATTENTION_HEADS] 1: [--kv-channels KV_CHANNELS] 1: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 1: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 1: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 1: [--layernorm-epsilon LAYERNORM_EPSILON] 1: [--sync-tp-duplicated-parameters] 1: [--apply-residual-connection-post-layernorm] 1: [--embed-layernorm] [--openai-gelu] 1: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 1: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 1: [--glu-activation {geglu,liglu,reglu,swiglu}] 1: [--kill-switch-path KILL_SWITCH_PATH] 1: [--log-level {debug,info,warning,error,critical}] 1: [--log-level-replica {debug,info,warning,error,critical}] 1: [--attention-dropout ATTENTION_DROPOUT] 1: [--hidden-dropout HIDDEN_DROPOUT] 1: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 1: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 1: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 1: [--micro-batch-size MICRO_BATCH_SIZE] 1: [--batch-size BATCH_SIZE] 1: [--global-batch-size GLOBAL_BATCH_SIZE] 1: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 1: [--checkpoint-activations] 1: [--distribute-checkpointed-activations] 1: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 1: [--train-iters TRAIN_ITERS] 1: [--train-samples TRAIN_SAMPLES] 1: [--train-tokens TRAIN_TOKENS] 1: [--log-interval LOG_INTERVAL] 1: [--exit-interval EXIT_INTERVAL] 1: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 1: [--tensorboard-dir TENSORBOARD_DIR] 1: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 1: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 1: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 1: [--use-bnb-optimizer] 1: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 1: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 1: [--eval-only EVAL_ONLY] 1: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 1: [--inference] 1: [--abort-on-unmet-fused-kernel-constraints] 1: [--pp-partition-method PP_PARTITION_METHOD] 1: [--seed SEED] [--init-method-std INIT_METHOD_STD] 1: [--init-method-xavier-uniform] [--lr LR] 1: [--lr-decay-style {constant,linear,cosine}] 1: [--lr-decay-iters LR_DECAY_ITERS] 1: [--lr-decay-samples LR_DECAY_SAMPLES] 1: [--lr-decay-tokens LR_DECAY_TOKENS] 1: [--lr-warmup-fraction LR_WARMUP_FRACTION] 1: [--lr-warmup-iters LR_WARMUP_ITERS] 1: [--lr-warmup-samples LR_WARMUP_SAMPLES] 1: [--warmup WARMUP] [--min-lr MIN_LR] 1: [--override-lr-scheduler] 1: [--use-checkpoint-lr-scheduler] 1: [--universal-checkpoint] [--save SAVE] 1: [--save-interval SAVE_INTERVAL] [--no-save-optim] 1: [--no-save-rng] [--load LOAD] [--no-load-optim] 1: [--no-load-rng] [--finetune] [--fp16] [--bf16] 1: [--loss-scale LOSS_SCALE] 1: [--initial-loss-scale INITIAL_LOSS_SCALE] 1: [--min-loss-scale MIN_LOSS_SCALE] 1: [--loss-scale-window LOSS_SCALE_WINDOW] 1: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 1: [--no-query-key-layer-scaling] 1: [--attention-softmax-in-fp32] 1: [--accumulate-allreduce-grads-in-fp32] 1: [--fp16-lm-cross-entropy] 1: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 1: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 1: [--model-parallel-size MODEL_PARALLEL_SIZE] 1: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 1: [--distributed-backend {nccl,gloo}] 1: [--DDP-impl {local,torch}] 1: [--use-contiguous-buffers-in-ddp] 1: [--no-scatter-gather-tensors-in-pipeline] 1: [--local_rank LOCAL_RANK] 1: [--lazy-mpu-init LAZY_MPU_INIT] 1: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 1: [--eval-interval EVAL_INTERVAL] 1: [--data-path [DATA_PATH ...]] [--split SPLIT] 1: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 1: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 1: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 1: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 1: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 1: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 1: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 1: [--merge-file MERGE_FILE] 1: [--vocab-extra-ids VOCAB_EXTRA_IDS] 1: [--seq-length SEQ_LENGTH] 1: [--encoder-seq-length ENCODER_SEQ_LENGTH] 1: [--decoder-seq-length DECODER_SEQ_LENGTH] 1: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 1: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 1: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 1: [--num-workers NUM_WORKERS] 1: [--valid-num-workers VALID_NUM_WORKERS] 1: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 1: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 1: [--data-impl {lazy,cached,mmap,infer}] 1: [--reset-position-ids] [--reset-attention-mask] 1: [--eod-mask-loss] [--loss-on-targets-only] 1: [--reweight-loss-based-on-position-frequency] 1: [--noise-density NOISE_DENSITY] 1: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 1: [--adlr-autoresume] 1: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 1: [--ict-head-size ICT_HEAD_SIZE] 1: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 1: [--biencoder-shared-query-context-model] 1: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 1: [--titles-data-path TITLES_DATA_PATH] 1: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 1: [--use-one-sent-docs] 1: [--evidence-data-path EVIDENCE_DATA_PATH] 1: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 1: [--retriever-score-scaling] 1: [--block-data-path BLOCK_DATA_PATH] 1: [--embedding-path EMBEDDING_PATH] 1: [--indexer-batch-size INDEXER_BATCH_SIZE] 1: [--indexer-log-interval INDEXER_LOG_INTERVAL] 1: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 1: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 1: [--log-params-norm] [--log-num-zeros-in-grad] 1: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 1: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 1: [--log-timers-to-tensorboard] 1: [--log-batch-size-to-tensorboard] 1: [--no-log-learnig-rate-to-tensorboard] 1: [--no-log-loss-scale-to-tensorboard] 1: [--log-validation-ppl-to-tensorboard] 1: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 1: [--zero-contigious-gradients] 1: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 1: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 1: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 1: [--scattered-embeddings] [--split-transformers] 1: [--memory-centric-tiled-linear] 1: [--tile-factor TILE_FACTOR] 1: [--deepspeed-activation-checkpointing] 1: [--partition-activations] [--contigious-checkpointing] 1: [--checkpoint-in-cpu] [--synchronize-each-layer] 1: [--profile-backward] [--deepspeed] 1: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 1: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 1: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 1: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 1: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 1: [--hidden-size HIDDEN_SIZE] 1: [--ffn-hidden-size FFN_HIDDEN_SIZE] 1: [--num-attention-heads NUM_ATTENTION_HEADS] 1: [--kv-channels KV_CHANNELS] 1: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 1: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 1: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 1: [--layernorm-epsilon LAYERNORM_EPSILON] 1: [--sync-tp-duplicated-parameters] 1: [--apply-residual-connection-post-layernorm] 1: [--embed-layernorm] [--openai-gelu] 1: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 1: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 1: [--glu-activation {geglu,liglu,reglu,swiglu}] 1: [--kill-switch-path KILL_SWITCH_PATH] 1: [--log-level {debug,info,warning,error,critical}] 1: [--log-level-replica {debug,info,warning,error,critical}] 1: [--attention-dropout ATTENTION_DROPOUT] 1: [--hidden-dropout HIDDEN_DROPOUT] 1: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 1: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 1: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 1: [--micro-batch-size MICRO_BATCH_SIZE] 1: [--batch-size BATCH_SIZE] 1: [--global-batch-size GLOBAL_BATCH_SIZE] 1: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 1: [--checkpoint-activations] 1: [--distribute-checkpointed-activations] 1: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 1: [--train-iters TRAIN_ITERS] 1: [--train-samples TRAIN_SAMPLES] 1: [--train-tokens TRAIN_TOKENS] 1: [--log-interval LOG_INTERVAL] 1: [--exit-interval EXIT_INTERVAL] 1: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 1: [--tensorboard-dir TENSORBOARD_DIR] 1: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 1: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 1: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 1: [--use-bnb-optimizer] 1: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 1: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 1: [--eval-only EVAL_ONLY] 1: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 1: [--inference] 1: [--abort-on-unmet-fused-kernel-constraints] 1: [--pp-partition-method PP_PARTITION_METHOD] 1: [--seed SEED] [--init-method-std INIT_METHOD_STD] 1: [--init-method-xavier-uniform] [--lr LR] 1: [--lr-decay-style {constant,linear,cosine}] 1: [--lr-decay-iters LR_DECAY_ITERS] 1: [--lr-decay-samples LR_DECAY_SAMPLES] 1: [--lr-decay-tokens LR_DECAY_TOKENS] 1: [--lr-warmup-fraction LR_WARMUP_FRACTION] 1: [--lr-warmup-iters LR_WARMUP_ITERS] 1: [--lr-warmup-samples LR_WARMUP_SAMPLES] 1: [--warmup WARMUP] [--min-lr MIN_LR] 1: [--override-lr-scheduler] 1: [--use-checkpoint-lr-scheduler] 1: [--universal-checkpoint] [--save SAVE] 1: [--save-interval SAVE_INTERVAL] [--no-save-optim] 1: [--no-save-rng] [--load LOAD] [--no-load-optim] 1: [--no-load-rng] [--finetune] [--fp16] [--bf16] 1: [--loss-scale LOSS_SCALE] 1: [--initial-loss-scale INITIAL_LOSS_SCALE] 1: [--min-loss-scale MIN_LOSS_SCALE] 1: [--loss-scale-window LOSS_SCALE_WINDOW] 1: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 1: [--no-query-key-layer-scaling] 1: [--attention-softmax-in-fp32] 1: [--accumulate-allreduce-grads-in-fp32] 1: [--fp16-lm-cross-entropy] 1: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 1: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 1: [--model-parallel-size MODEL_PARALLEL_SIZE] 1: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 1: [--distributed-backend {nccl,gloo}] 1: [--DDP-impl {local,torch}] 1: [--use-contiguous-buffers-in-ddp] 1: [--no-scatter-gather-tensors-in-pipeline] 1: [--local_rank LOCAL_RANK] 1: [--lazy-mpu-init LAZY_MPU_INIT] 1: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 1: [--eval-interval EVAL_INTERVAL] 1: [--data-path [DATA_PATH ...]] [--split SPLIT] 1: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 1: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 1: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 1: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 1: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 1: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 1: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 1: [--merge-file MERGE_FILE] 1: [--vocab-extra-ids VOCAB_EXTRA_IDS] 1: [--seq-length SEQ_LENGTH] 1: [--encoder-seq-length ENCODER_SEQ_LENGTH] 1: [--decoder-seq-length DECODER_SEQ_LENGTH] 1: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 1: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 1: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 1: [--num-workers NUM_WORKERS] 1: [--valid-num-workers VALID_NUM_WORKERS] 1: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 1: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 1: [--data-impl {lazy,cached,mmap,infer}] 1: [--reset-position-ids] [--reset-attention-mask] 1: [--eod-mask-loss] [--loss-on-targets-only] 1: [--reweight-loss-based-on-position-frequency] 1: [--noise-density NOISE_DENSITY] 1: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 1: [--adlr-autoresume] 1: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 1: [--ict-head-size ICT_HEAD_SIZE] 1: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 1: [--biencoder-shared-query-context-model] 1: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 1: [--titles-data-path TITLES_DATA_PATH] 1: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 1: [--use-one-sent-docs] 1: [--evidence-data-path EVIDENCE_DATA_PATH] 1: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 1: [--retriever-score-scaling] 1: [--block-data-path BLOCK_DATA_PATH] 1: [--embedding-path EMBEDDING_PATH] 1: [--indexer-batch-size INDEXER_BATCH_SIZE] 1: [--indexer-log-interval INDEXER_LOG_INTERVAL] 1: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 1: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 1: [--log-params-norm] [--log-num-zeros-in-grad] 1: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 1: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 1: [--log-timers-to-tensorboard] 1: [--log-batch-size-to-tensorboard] 1: [--no-log-learnig-rate-to-tensorboard] 1: [--no-log-loss-scale-to-tensorboard] 1: [--log-validation-ppl-to-tensorboard] 1: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 1: [--zero-contigious-gradients] 1: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 1: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 1: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 1: [--scattered-embeddings] [--split-transformers] 1: [--memory-centric-tiled-linear] 1: [--tile-factor TILE_FACTOR] 1: [--deepspeed-activation-checkpointing] 1: [--partition-activations] [--contigious-checkpointing] 1: [--checkpoint-in-cpu] [--synchronize-each-layer] 1: [--profile-backward] [--deepspeed] 1: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 1: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 1: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 5: [--hidden-size HIDDEN_SIZE] 5: [--ffn-hidden-size FFN_HIDDEN_SIZE] 5: [--num-attention-heads NUM_ATTENTION_HEADS] 5: [--kv-channels KV_CHANNELS] 5: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 5: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 5: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 5: [--layernorm-epsilon LAYERNORM_EPSILON] 5: [--sync-tp-duplicated-parameters] 5: [--apply-residual-connection-post-layernorm] 5: [--embed-layernorm] [--openai-gelu] 5: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 5: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 5: [--glu-activation {geglu,liglu,reglu,swiglu}] 5: [--kill-switch-path KILL_SWITCH_PATH] 5: [--log-level {debug,info,warning,error,critical}] 5: [--log-level-replica {debug,info,warning,error,critical}] 5: [--attention-dropout ATTENTION_DROPOUT] 5: [--hidden-dropout HIDDEN_DROPOUT] 5: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 5: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 5: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 5: [--micro-batch-size MICRO_BATCH_SIZE] 5: [--batch-size BATCH_SIZE] 5: [--global-batch-size GLOBAL_BATCH_SIZE] 5: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 5: [--checkpoint-activations] 5: [--distribute-checkpointed-activations] 5: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 5: [--train-iters TRAIN_ITERS] 5: [--train-samples TRAIN_SAMPLES] 5: [--train-tokens TRAIN_TOKENS] 5: [--log-interval LOG_INTERVAL] 5: [--exit-interval EXIT_INTERVAL] 5: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 5: [--tensorboard-dir TENSORBOARD_DIR] 5: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 5: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 5: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 5: [--use-bnb-optimizer] 5: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 5: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 5: [--eval-only EVAL_ONLY] 5: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 5: [--inference] 5: [--abort-on-unmet-fused-kernel-constraints] 5: [--pp-partition-method PP_PARTITION_METHOD] 5: [--seed SEED] [--init-method-std INIT_METHOD_STD] 5: [--init-method-xavier-uniform] [--lr LR] 5: [--lr-decay-style {constant,linear,cosine}] 5: [--lr-decay-iters LR_DECAY_ITERS] 5: [--lr-decay-samples LR_DECAY_SAMPLES] 5: [--lr-decay-tokens LR_DECAY_TOKENS] 5: [--lr-warmup-fraction LR_WARMUP_FRACTION] 5: [--lr-warmup-iters LR_WARMUP_ITERS] 5: [--lr-warmup-samples LR_WARMUP_SAMPLES] 5: [--warmup WARMUP] [--min-lr MIN_LR] 5: [--override-lr-scheduler] 5: [--use-checkpoint-lr-scheduler] 5: [--universal-checkpoint] [--save SAVE] 5: [--save-interval SAVE_INTERVAL] [--no-save-optim] 2: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 2: [--hidden-size HIDDEN_SIZE] 2: [--ffn-hidden-size FFN_HIDDEN_SIZE] 2: [--num-attention-heads NUM_ATTENTION_HEADS] 2: [--kv-channels KV_CHANNELS] 2: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 2: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 2: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 2: [--layernorm-epsilon LAYERNORM_EPSILON] 2: [--sync-tp-duplicated-parameters] 2: [--apply-residual-connection-post-layernorm] 2: [--embed-layernorm] [--openai-gelu] 2: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 2: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 5: [--no-save-rng] [--load LOAD] [--no-load-optim] 5: [--no-load-rng] [--finetune] [--fp16] [--bf16] 5: [--loss-scale LOSS_SCALE] 5: [--initial-loss-scale INITIAL_LOSS_SCALE] 5: [--min-loss-scale MIN_LOSS_SCALE] 5: [--loss-scale-window LOSS_SCALE_WINDOW] 5: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 5: [--no-query-key-layer-scaling] 5: [--attention-softmax-in-fp32] 5: [--accumulate-allreduce-grads-in-fp32] 5: [--fp16-lm-cross-entropy] 5: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 5: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 5: [--model-parallel-size MODEL_PARALLEL_SIZE] 5: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 2: [--glu-activation {geglu,liglu,reglu,swiglu}] 2: [--kill-switch-path KILL_SWITCH_PATH] 2: [--log-level {debug,info,warning,error,critical}] 2: [--log-level-replica {debug,info,warning,error,critical}] 2: [--attention-dropout ATTENTION_DROPOUT] 2: [--hidden-dropout HIDDEN_DROPOUT] 2: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 2: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 2: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 2: [--micro-batch-size MICRO_BATCH_SIZE] 2: [--batch-size BATCH_SIZE] 2: [--global-batch-size GLOBAL_BATCH_SIZE] 2: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 2: [--checkpoint-activations] 2: [--distribute-checkpointed-activations] 5: [--distributed-backend {nccl,gloo}] 5: [--DDP-impl {local,torch}] 5: [--use-contiguous-buffers-in-ddp] 5: [--no-scatter-gather-tensors-in-pipeline] 5: [--local_rank LOCAL_RANK] 5: [--lazy-mpu-init LAZY_MPU_INIT] 5: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 5: [--eval-interval EVAL_INTERVAL] 5: [--data-path [DATA_PATH ...]] [--split SPLIT] 5: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 5: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 5: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 5: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 5: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 2: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 2: [--train-iters TRAIN_ITERS] 2: [--train-samples TRAIN_SAMPLES] 2: [--train-tokens TRAIN_TOKENS] 2: [--log-interval LOG_INTERVAL] 2: [--exit-interval EXIT_INTERVAL] 2: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 2: [--tensorboard-dir TENSORBOARD_DIR] 2: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 2: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 2: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 2: [--use-bnb-optimizer] 2: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 2: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 2: [--eval-only EVAL_ONLY] 5: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 5: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 5: [--merge-file MERGE_FILE] 5: [--vocab-extra-ids VOCAB_EXTRA_IDS] 5: [--seq-length SEQ_LENGTH] 5: [--encoder-seq-length ENCODER_SEQ_LENGTH] 5: [--decoder-seq-length DECODER_SEQ_LENGTH] 5: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 5: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 5: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 5: [--num-workers NUM_WORKERS] 5: [--valid-num-workers VALID_NUM_WORKERS] 5: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 5: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 2: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 2: [--inference] 2: [--abort-on-unmet-fused-kernel-constraints] 2: [--pp-partition-method PP_PARTITION_METHOD] 2: [--seed SEED] [--init-method-std INIT_METHOD_STD] 2: [--init-method-xavier-uniform] [--lr LR] 2: [--lr-decay-style {constant,linear,cosine}] 2: [--lr-decay-iters LR_DECAY_ITERS] 2: [--lr-decay-samples LR_DECAY_SAMPLES] 2: [--lr-decay-tokens LR_DECAY_TOKENS] 2: [--lr-warmup-fraction LR_WARMUP_FRACTION] 2: [--lr-warmup-iters LR_WARMUP_ITERS] 2: [--lr-warmup-samples LR_WARMUP_SAMPLES] 2: [--warmup WARMUP] [--min-lr MIN_LR] 2: [--override-lr-scheduler] 2: [--use-checkpoint-lr-scheduler] 5: [--data-impl {lazy,cached,mmap,infer}] 5: [--reset-position-ids] [--reset-attention-mask] 5: [--eod-mask-loss] [--loss-on-targets-only] 5: [--reweight-loss-based-on-position-frequency] 5: [--noise-density NOISE_DENSITY] 5: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 5: [--adlr-autoresume] 5: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 5: [--ict-head-size ICT_HEAD_SIZE] 5: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 5: [--biencoder-shared-query-context-model] 5: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 5: [--titles-data-path TITLES_DATA_PATH] 5: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 5: [--use-one-sent-docs] 5: [--evidence-data-path EVIDENCE_DATA_PATH] 2: [--universal-checkpoint] [--save SAVE] 2: [--save-interval SAVE_INTERVAL] [--no-save-optim] 5: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 5: [--retriever-score-scaling] 5: [--block-data-path BLOCK_DATA_PATH] 5: [--embedding-path EMBEDDING_PATH] 5: [--indexer-batch-size INDEXER_BATCH_SIZE] 5: [--indexer-log-interval INDEXER_LOG_INTERVAL] 5: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 5: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 5: [--log-params-norm] [--log-num-zeros-in-grad] 5: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 5: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 5: [--log-timers-to-tensorboard] 5: [--log-batch-size-to-tensorboard] 5: [--no-log-learnig-rate-to-tensorboard] 2: [--no-save-rng] [--load LOAD] [--no-load-optim] 2: [--no-load-rng] [--finetune] [--fp16] [--bf16] 2: [--loss-scale LOSS_SCALE] 2: [--initial-loss-scale INITIAL_LOSS_SCALE] 2: [--min-loss-scale MIN_LOSS_SCALE] 2: [--loss-scale-window LOSS_SCALE_WINDOW] 2: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 2: [--no-query-key-layer-scaling] 2: [--attention-softmax-in-fp32] 2: [--accumulate-allreduce-grads-in-fp32] 2: [--fp16-lm-cross-entropy] 2: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 2: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 2: [--model-parallel-size MODEL_PARALLEL_SIZE] 2: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 5: [--no-log-loss-scale-to-tensorboard] 5: [--log-validation-ppl-to-tensorboard] 5: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 5: [--zero-contigious-gradients] 5: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 5: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 5: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 5: [--scattered-embeddings] [--split-transformers] 5: [--memory-centric-tiled-linear] 5: [--tile-factor TILE_FACTOR] 5: [--deepspeed-activation-checkpointing] 5: [--partition-activations] [--contigious-checkpointing] 5: [--checkpoint-in-cpu] [--synchronize-each-layer] 5: [--profile-backward] [--deepspeed] 5: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 2: [--distributed-backend {nccl,gloo}] 2: [--DDP-impl {local,torch}] 2: [--use-contiguous-buffers-in-ddp] 2: [--no-scatter-gather-tensors-in-pipeline] 2: [--local_rank LOCAL_RANK] 2: [--lazy-mpu-init LAZY_MPU_INIT] 2: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 2: [--eval-interval EVAL_INTERVAL] 2: [--data-path [DATA_PATH ...]] [--split SPLIT] 2: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 2: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 2: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 2: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 2: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 5: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 5: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 2: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 2: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 2: [--merge-file MERGE_FILE] 2: [--vocab-extra-ids VOCAB_EXTRA_IDS] 2: [--seq-length SEQ_LENGTH] 2: [--encoder-seq-length ENCODER_SEQ_LENGTH] 2: [--decoder-seq-length DECODER_SEQ_LENGTH] 2: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 2: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 2: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 2: [--num-workers NUM_WORKERS] 2: [--valid-num-workers VALID_NUM_WORKERS] 2: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 2: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 2: [--data-impl {lazy,cached,mmap,infer}] 2: [--reset-position-ids] [--reset-attention-mask] 2: [--eod-mask-loss] [--loss-on-targets-only] 2: [--reweight-loss-based-on-position-frequency] 2: [--noise-density NOISE_DENSITY] 2: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 2: [--adlr-autoresume] 2: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 2: [--ict-head-size ICT_HEAD_SIZE] 2: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 2: [--biencoder-shared-query-context-model] 2: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 2: [--titles-data-path TITLES_DATA_PATH] 2: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 2: [--use-one-sent-docs] 2: [--evidence-data-path EVIDENCE_DATA_PATH] 2: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 2: [--retriever-score-scaling] 2: [--block-data-path BLOCK_DATA_PATH] 2: [--embedding-path EMBEDDING_PATH] 2: [--indexer-batch-size INDEXER_BATCH_SIZE] 2: [--indexer-log-interval INDEXER_LOG_INTERVAL] 2: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 2: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 2: [--log-params-norm] [--log-num-zeros-in-grad] 2: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 2: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 2: [--log-timers-to-tensorboard] 2: [--log-batch-size-to-tensorboard] 2: [--no-log-learnig-rate-to-tensorboard] 2: [--no-log-loss-scale-to-tensorboard] 2: [--log-validation-ppl-to-tensorboard] 2: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 2: [--zero-contigious-gradients] 2: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 2: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 2: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 2: [--scattered-embeddings] [--split-transformers] 2: [--memory-centric-tiled-linear] 2: [--tile-factor TILE_FACTOR] 2: [--deepspeed-activation-checkpointing] 2: [--partition-activations] [--contigious-checkpointing] 2: [--checkpoint-in-cpu] [--synchronize-each-layer] 2: [--profile-backward] [--deepspeed] 2: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 2: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 2: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 2: [--hidden-size HIDDEN_SIZE] 2: [--ffn-hidden-size FFN_HIDDEN_SIZE] 2: [--num-attention-heads NUM_ATTENTION_HEADS] 2: [--kv-channels KV_CHANNELS] 2: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 2: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 2: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 2: [--layernorm-epsilon LAYERNORM_EPSILON] 2: [--sync-tp-duplicated-parameters] 2: [--apply-residual-connection-post-layernorm] 2: [--embed-layernorm] [--openai-gelu] 2: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 2: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 2: [--glu-activation {geglu,liglu,reglu,swiglu}] 2: [--kill-switch-path KILL_SWITCH_PATH] 2: [--log-level {debug,info,warning,error,critical}] 2: [--log-level-replica {debug,info,warning,error,critical}] 2: [--attention-dropout ATTENTION_DROPOUT] 2: [--hidden-dropout HIDDEN_DROPOUT] 2: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 2: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 2: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 2: [--micro-batch-size MICRO_BATCH_SIZE] 2: [--batch-size BATCH_SIZE] 2: [--global-batch-size GLOBAL_BATCH_SIZE] 2: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 2: [--checkpoint-activations] 2: [--distribute-checkpointed-activations] 2: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 2: [--train-iters TRAIN_ITERS] 2: [--train-samples TRAIN_SAMPLES] 2: [--train-tokens TRAIN_TOKENS] 2: [--log-interval LOG_INTERVAL] 2: [--exit-interval EXIT_INTERVAL] 2: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 2: [--tensorboard-dir TENSORBOARD_DIR] 2: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 2: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 2: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 2: [--use-bnb-optimizer] 2: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 2: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 2: [--eval-only EVAL_ONLY] 2: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 2: [--inference] 2: [--abort-on-unmet-fused-kernel-constraints] 2: [--pp-partition-method PP_PARTITION_METHOD] 2: [--seed SEED] [--init-method-std INIT_METHOD_STD] 2: [--init-method-xavier-uniform] [--lr LR] 2: [--lr-decay-style {constant,linear,cosine}] 2: [--lr-decay-iters LR_DECAY_ITERS] 2: [--lr-decay-samples LR_DECAY_SAMPLES] 2: [--lr-decay-tokens LR_DECAY_TOKENS] 2: [--lr-warmup-fraction LR_WARMUP_FRACTION] 2: [--lr-warmup-iters LR_WARMUP_ITERS] 2: [--lr-warmup-samples LR_WARMUP_SAMPLES] 2: [--warmup WARMUP] [--min-lr MIN_LR] 2: [--override-lr-scheduler] 2: [--use-checkpoint-lr-scheduler] 2: [--universal-checkpoint] [--save SAVE] 2: [--save-interval SAVE_INTERVAL] [--no-save-optim] 2: [--no-save-rng] [--load LOAD] [--no-load-optim] 2: [--no-load-rng] [--finetune] [--fp16] [--bf16] 2: [--loss-scale LOSS_SCALE] 2: [--initial-loss-scale INITIAL_LOSS_SCALE] 2: [--min-loss-scale MIN_LOSS_SCALE] 2: [--loss-scale-window LOSS_SCALE_WINDOW] 2: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 2: [--no-query-key-layer-scaling] 2: [--attention-softmax-in-fp32] 2: [--accumulate-allreduce-grads-in-fp32] 2: [--fp16-lm-cross-entropy] 2: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 2: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 2: [--model-parallel-size MODEL_PARALLEL_SIZE] 2: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 2: [--distributed-backend {nccl,gloo}] 2: [--DDP-impl {local,torch}] 2: [--use-contiguous-buffers-in-ddp] 2: [--no-scatter-gather-tensors-in-pipeline] 2: [--local_rank LOCAL_RANK] 2: [--lazy-mpu-init LAZY_MPU_INIT] 2: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 2: [--eval-interval EVAL_INTERVAL] 2: [--data-path [DATA_PATH ...]] [--split SPLIT] 2: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 2: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 2: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 2: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 2: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 2: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 2: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 2: [--merge-file MERGE_FILE] 2: [--vocab-extra-ids VOCAB_EXTRA_IDS] 2: [--seq-length SEQ_LENGTH] 2: [--encoder-seq-length ENCODER_SEQ_LENGTH] 2: [--decoder-seq-length DECODER_SEQ_LENGTH] 2: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 2: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 2: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 2: [--num-workers NUM_WORKERS] 2: [--valid-num-workers VALID_NUM_WORKERS] 2: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 2: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 2: [--data-impl {lazy,cached,mmap,infer}] 2: [--reset-position-ids] [--reset-attention-mask] 2: [--eod-mask-loss] [--loss-on-targets-only] 2: [--reweight-loss-based-on-position-frequency] 2: [--noise-density NOISE_DENSITY] 2: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 2: [--adlr-autoresume] 2: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 2: [--ict-head-size ICT_HEAD_SIZE] 2: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 2: [--biencoder-shared-query-context-model] 2: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 2: [--titles-data-path TITLES_DATA_PATH] 2: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 2: [--use-one-sent-docs] 2: [--evidence-data-path EVIDENCE_DATA_PATH] 2: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 2: [--retriever-score-scaling] 2: [--block-data-path BLOCK_DATA_PATH] 2: [--embedding-path EMBEDDING_PATH] 2: [--indexer-batch-size INDEXER_BATCH_SIZE] 2: [--indexer-log-interval INDEXER_LOG_INTERVAL] 2: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 2: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 2: [--log-params-norm] [--log-num-zeros-in-grad] 2: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 2: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 2: [--log-timers-to-tensorboard] 2: [--log-batch-size-to-tensorboard] 2: [--no-log-learnig-rate-to-tensorboard] 2: [--no-log-loss-scale-to-tensorboard] 2: [--log-validation-ppl-to-tensorboard] 2: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 2: [--zero-contigious-gradients] 2: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 2: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 2: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 2: [--scattered-embeddings] [--split-transformers] 2: [--memory-centric-tiled-linear] 2: [--tile-factor TILE_FACTOR] 2: [--deepspeed-activation-checkpointing] 2: [--partition-activations] [--contigious-checkpointing] 2: [--checkpoint-in-cpu] [--synchronize-each-layer] 2: [--profile-backward] [--deepspeed] 2: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 2: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 2: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 1: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 1: [--hidden-size HIDDEN_SIZE] 1: [--ffn-hidden-size FFN_HIDDEN_SIZE] 1: [--num-attention-heads NUM_ATTENTION_HEADS] 1: [--kv-channels KV_CHANNELS] 1: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 1: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 1: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 1: [--layernorm-epsilon LAYERNORM_EPSILON] 1: [--sync-tp-duplicated-parameters] 1: [--apply-residual-connection-post-layernorm] 1: [--embed-layernorm] [--openai-gelu] 1: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 1: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 1: [--glu-activation {geglu,liglu,reglu,swiglu}] 1: [--kill-switch-path KILL_SWITCH_PATH] 1: [--log-level {debug,info,warning,error,critical}] 1: [--log-level-replica {debug,info,warning,error,critical}] 1: [--attention-dropout ATTENTION_DROPOUT] 1: [--hidden-dropout HIDDEN_DROPOUT] 1: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 1: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 1: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 1: [--micro-batch-size MICRO_BATCH_SIZE] 1: [--batch-size BATCH_SIZE] 1: [--global-batch-size GLOBAL_BATCH_SIZE] 1: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 1: [--checkpoint-activations] 1: [--distribute-checkpointed-activations] 1: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 1: [--train-iters TRAIN_ITERS] 1: [--train-samples TRAIN_SAMPLES] 1: [--train-tokens TRAIN_TOKENS] 1: [--log-interval LOG_INTERVAL] 1: [--exit-interval EXIT_INTERVAL] 1: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 1: [--tensorboard-dir TENSORBOARD_DIR] 1: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 1: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 1: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 1: [--use-bnb-optimizer] 1: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 1: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 1: [--eval-only EVAL_ONLY] 1: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 1: [--inference] 1: [--abort-on-unmet-fused-kernel-constraints] 1: [--pp-partition-method PP_PARTITION_METHOD] 1: [--seed SEED] [--init-method-std INIT_METHOD_STD] 1: [--init-method-xavier-uniform] [--lr LR] 1: [--lr-decay-style {constant,linear,cosine}] 1: [--lr-decay-iters LR_DECAY_ITERS] 1: [--lr-decay-samples LR_DECAY_SAMPLES] 1: [--lr-decay-tokens LR_DECAY_TOKENS] 1: [--lr-warmup-fraction LR_WARMUP_FRACTION] 1: [--lr-warmup-iters LR_WARMUP_ITERS] 1: [--lr-warmup-samples LR_WARMUP_SAMPLES] 1: [--warmup WARMUP] [--min-lr MIN_LR] 1: [--override-lr-scheduler] 1: [--use-checkpoint-lr-scheduler] 1: [--universal-checkpoint] [--save SAVE] 1: [--save-interval SAVE_INTERVAL] [--no-save-optim] 5: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 5: [--hidden-size HIDDEN_SIZE] 5: [--ffn-hidden-size FFN_HIDDEN_SIZE] 5: [--num-attention-heads NUM_ATTENTION_HEADS] 5: [--kv-channels KV_CHANNELS] 5: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 5: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 5: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 5: [--layernorm-epsilon LAYERNORM_EPSILON] 5: [--sync-tp-duplicated-parameters] 5: [--apply-residual-connection-post-layernorm] 5: [--embed-layernorm] [--openai-gelu] 5: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 5: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 1: [--no-save-rng] [--load LOAD] [--no-load-optim] 1: [--no-load-rng] [--finetune] [--fp16] [--bf16] 1: [--loss-scale LOSS_SCALE] 1: [--initial-loss-scale INITIAL_LOSS_SCALE] 1: [--min-loss-scale MIN_LOSS_SCALE] 1: [--loss-scale-window LOSS_SCALE_WINDOW] 1: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 1: [--no-query-key-layer-scaling] 1: [--attention-softmax-in-fp32] 1: [--accumulate-allreduce-grads-in-fp32] 1: [--fp16-lm-cross-entropy] 1: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 1: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 1: [--model-parallel-size MODEL_PARALLEL_SIZE] 1: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 5: [--glu-activation {geglu,liglu,reglu,swiglu}] 5: [--kill-switch-path KILL_SWITCH_PATH] 5: [--log-level {debug,info,warning,error,critical}] 5: [--log-level-replica {debug,info,warning,error,critical}] 5: [--attention-dropout ATTENTION_DROPOUT] 5: [--hidden-dropout HIDDEN_DROPOUT] 5: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 5: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 5: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 5: [--micro-batch-size MICRO_BATCH_SIZE] 5: [--batch-size BATCH_SIZE] 5: [--global-batch-size GLOBAL_BATCH_SIZE] 5: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 5: [--checkpoint-activations] 5: [--distribute-checkpointed-activations] 1: [--distributed-backend {nccl,gloo}] 1: [--DDP-impl {local,torch}] 1: [--use-contiguous-buffers-in-ddp] 1: [--no-scatter-gather-tensors-in-pipeline] 1: [--local_rank LOCAL_RANK] 1: [--lazy-mpu-init LAZY_MPU_INIT] 1: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 1: [--eval-interval EVAL_INTERVAL] 1: [--data-path [DATA_PATH ...]] [--split SPLIT] 1: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 1: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 1: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 1: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 1: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 5: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 5: [--train-iters TRAIN_ITERS] 5: [--train-samples TRAIN_SAMPLES] 5: [--train-tokens TRAIN_TOKENS] 5: [--log-interval LOG_INTERVAL] 5: [--exit-interval EXIT_INTERVAL] 5: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 5: [--tensorboard-dir TENSORBOARD_DIR] 5: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 5: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 5: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 5: [--use-bnb-optimizer] 5: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 5: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 5: [--eval-only EVAL_ONLY] 1: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 1: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 1: [--merge-file MERGE_FILE] 1: [--vocab-extra-ids VOCAB_EXTRA_IDS] 1: [--seq-length SEQ_LENGTH] 1: [--encoder-seq-length ENCODER_SEQ_LENGTH] 1: [--decoder-seq-length DECODER_SEQ_LENGTH] 1: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 1: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 1: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 1: [--num-workers NUM_WORKERS] 1: [--valid-num-workers VALID_NUM_WORKERS] 1: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 1: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 5: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 5: [--inference] 5: [--abort-on-unmet-fused-kernel-constraints] 5: [--pp-partition-method PP_PARTITION_METHOD] 5: [--seed SEED] [--init-method-std INIT_METHOD_STD] 5: [--init-method-xavier-uniform] [--lr LR] 5: [--lr-decay-style {constant,linear,cosine}] 5: [--lr-decay-iters LR_DECAY_ITERS] 5: [--lr-decay-samples LR_DECAY_SAMPLES] 5: [--lr-decay-tokens LR_DECAY_TOKENS] 5: [--lr-warmup-fraction LR_WARMUP_FRACTION] 5: [--lr-warmup-iters LR_WARMUP_ITERS] 5: [--lr-warmup-samples LR_WARMUP_SAMPLES] 5: [--warmup WARMUP] [--min-lr MIN_LR] 5: [--override-lr-scheduler] 5: [--use-checkpoint-lr-scheduler] 1: [--data-impl {lazy,cached,mmap,infer}] 1: [--reset-position-ids] [--reset-attention-mask] 1: [--eod-mask-loss] [--loss-on-targets-only] 1: [--reweight-loss-based-on-position-frequency] 1: [--noise-density NOISE_DENSITY] 1: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 1: [--adlr-autoresume] 1: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 1: [--ict-head-size ICT_HEAD_SIZE] 1: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 1: [--biencoder-shared-query-context-model] 1: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 1: [--titles-data-path TITLES_DATA_PATH] 1: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 1: [--use-one-sent-docs] 1: [--evidence-data-path EVIDENCE_DATA_PATH] 5: [--universal-checkpoint] [--save SAVE] 5: [--save-interval SAVE_INTERVAL] [--no-save-optim] 1: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 1: [--retriever-score-scaling] 1: [--block-data-path BLOCK_DATA_PATH] 1: [--embedding-path EMBEDDING_PATH] 1: [--indexer-batch-size INDEXER_BATCH_SIZE] 1: [--indexer-log-interval INDEXER_LOG_INTERVAL] 1: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 1: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 1: [--log-params-norm] [--log-num-zeros-in-grad] 1: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 1: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 1: [--log-timers-to-tensorboard] 1: [--log-batch-size-to-tensorboard] 1: [--no-log-learnig-rate-to-tensorboard] 5: [--no-save-rng] [--load LOAD] [--no-load-optim] 5: [--no-load-rng] [--finetune] [--fp16] [--bf16] 5: [--loss-scale LOSS_SCALE] 5: [--initial-loss-scale INITIAL_LOSS_SCALE] 5: [--min-loss-scale MIN_LOSS_SCALE] 5: [--loss-scale-window LOSS_SCALE_WINDOW] 5: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 5: [--no-query-key-layer-scaling] 5: [--attention-softmax-in-fp32] 5: [--accumulate-allreduce-grads-in-fp32] 5: [--fp16-lm-cross-entropy] 5: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 5: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 5: [--model-parallel-size MODEL_PARALLEL_SIZE] 5: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 1: [--no-log-loss-scale-to-tensorboard] 1: [--log-validation-ppl-to-tensorboard] 1: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 1: [--zero-contigious-gradients] 1: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 1: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 1: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 1: [--scattered-embeddings] [--split-transformers] 1: [--memory-centric-tiled-linear] 1: [--tile-factor TILE_FACTOR] 1: [--deepspeed-activation-checkpointing] 1: [--partition-activations] [--contigious-checkpointing] 1: [--checkpoint-in-cpu] [--synchronize-each-layer] 1: [--profile-backward] [--deepspeed] 1: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 5: [--distributed-backend {nccl,gloo}] 5: [--DDP-impl {local,torch}] 5: [--use-contiguous-buffers-in-ddp] 5: [--no-scatter-gather-tensors-in-pipeline] 5: [--local_rank LOCAL_RANK] 5: [--lazy-mpu-init LAZY_MPU_INIT] 5: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 5: [--eval-interval EVAL_INTERVAL] 5: [--data-path [DATA_PATH ...]] [--split SPLIT] 5: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 5: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 5: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 5: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 5: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 1: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 1: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 1: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 1: [--hidden-size HIDDEN_SIZE] 1: [--ffn-hidden-size FFN_HIDDEN_SIZE] 1: [--num-attention-heads NUM_ATTENTION_HEADS] 1: [--kv-channels KV_CHANNELS] 1: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 1: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 1: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 1: [--layernorm-epsilon LAYERNORM_EPSILON] 1: [--sync-tp-duplicated-parameters] 1: [--apply-residual-connection-post-layernorm] 1: [--embed-layernorm] [--openai-gelu] 1: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 5: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 5: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 5: [--merge-file MERGE_FILE] 5: [--vocab-extra-ids VOCAB_EXTRA_IDS] 5: [--seq-length SEQ_LENGTH] 5: [--encoder-seq-length ENCODER_SEQ_LENGTH] 5: [--decoder-seq-length DECODER_SEQ_LENGTH] 5: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 5: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 5: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 5: [--num-workers NUM_WORKERS] 5: [--valid-num-workers VALID_NUM_WORKERS] 5: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 5: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 1: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 1: [--glu-activation {geglu,liglu,reglu,swiglu}] 1: [--kill-switch-path KILL_SWITCH_PATH] 1: [--log-level {debug,info,warning,error,critical}] 1: [--log-level-replica {debug,info,warning,error,critical}] 1: [--attention-dropout ATTENTION_DROPOUT] 1: [--hidden-dropout HIDDEN_DROPOUT] 1: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 1: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 1: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 1: [--micro-batch-size MICRO_BATCH_SIZE] 1: [--batch-size BATCH_SIZE] 1: [--global-batch-size GLOBAL_BATCH_SIZE] 1: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 5: [--data-impl {lazy,cached,mmap,infer}] 5: [--reset-position-ids] [--reset-attention-mask] 5: [--eod-mask-loss] [--loss-on-targets-only] 5: [--reweight-loss-based-on-position-frequency] 5: [--noise-density NOISE_DENSITY] 5: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 5: [--adlr-autoresume] 5: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 5: [--ict-head-size ICT_HEAD_SIZE] 5: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 5: [--biencoder-shared-query-context-model] 5: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 5: [--titles-data-path TITLES_DATA_PATH] 5: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 5: [--use-one-sent-docs] 5: [--evidence-data-path EVIDENCE_DATA_PATH] 1: [--checkpoint-activations] 1: [--distribute-checkpointed-activations] 5: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 5: [--retriever-score-scaling] 5: [--block-data-path BLOCK_DATA_PATH] 5: [--embedding-path EMBEDDING_PATH] 5: [--indexer-batch-size INDEXER_BATCH_SIZE] 5: [--indexer-log-interval INDEXER_LOG_INTERVAL] 5: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 5: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 5: [--log-params-norm] [--log-num-zeros-in-grad] 5: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 5: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 5: [--log-timers-to-tensorboard] 5: [--log-batch-size-to-tensorboard] 5: [--no-log-learnig-rate-to-tensorboard] 1: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 1: [--train-iters TRAIN_ITERS] 1: [--train-samples TRAIN_SAMPLES] 1: [--train-tokens TRAIN_TOKENS] 1: [--log-interval LOG_INTERVAL] 1: [--exit-interval EXIT_INTERVAL] 1: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 1: [--tensorboard-dir TENSORBOARD_DIR] 1: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 1: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 1: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 1: [--use-bnb-optimizer] 1: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 1: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 1: [--eval-only EVAL_ONLY] 5: [--no-log-loss-scale-to-tensorboard] 5: [--log-validation-ppl-to-tensorboard] 5: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 5: [--zero-contigious-gradients] 5: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 5: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 5: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 5: [--scattered-embeddings] [--split-transformers] 5: [--memory-centric-tiled-linear] 5: [--tile-factor TILE_FACTOR] 5: [--deepspeed-activation-checkpointing] 5: [--partition-activations] [--contigious-checkpointing] 5: [--checkpoint-in-cpu] [--synchronize-each-layer] 5: [--profile-backward] [--deepspeed] 5: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 1: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 1: [--inference] 1: [--abort-on-unmet-fused-kernel-constraints] 1: [--pp-partition-method PP_PARTITION_METHOD] 1: [--seed SEED] [--init-method-std INIT_METHOD_STD] 1: [--init-method-xavier-uniform] [--lr LR] 1: [--lr-decay-style {constant,linear,cosine}] 1: [--lr-decay-iters LR_DECAY_ITERS] 1: [--lr-decay-samples LR_DECAY_SAMPLES] 1: [--lr-decay-tokens LR_DECAY_TOKENS] 1: [--lr-warmup-fraction LR_WARMUP_FRACTION] 1: [--lr-warmup-iters LR_WARMUP_ITERS] 1: [--lr-warmup-samples LR_WARMUP_SAMPLES] 1: [--warmup WARMUP] [--min-lr MIN_LR] 1: [--override-lr-scheduler] 1: [--use-checkpoint-lr-scheduler] 5: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 5: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 1: [--universal-checkpoint] [--save SAVE] 1: [--save-interval SAVE_INTERVAL] [--no-save-optim] 1: [--no-save-rng] [--load LOAD] [--no-load-optim] 1: [--no-load-rng] [--finetune] [--fp16] [--bf16] 1: [--loss-scale LOSS_SCALE] 1: [--initial-loss-scale INITIAL_LOSS_SCALE] 1: [--min-loss-scale MIN_LOSS_SCALE] 1: [--loss-scale-window LOSS_SCALE_WINDOW] 1: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 1: [--no-query-key-layer-scaling] 1: [--attention-softmax-in-fp32] 1: [--accumulate-allreduce-grads-in-fp32] 1: [--fp16-lm-cross-entropy] 1: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 1: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 1: [--model-parallel-size MODEL_PARALLEL_SIZE] 1: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 1: [--distributed-backend {nccl,gloo}] 1: [--DDP-impl {local,torch}] 1: [--use-contiguous-buffers-in-ddp] 1: [--no-scatter-gather-tensors-in-pipeline] 1: [--local_rank LOCAL_RANK] 1: [--lazy-mpu-init LAZY_MPU_INIT] 1: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 1: [--eval-interval EVAL_INTERVAL] 1: [--data-path [DATA_PATH ...]] [--split SPLIT] 1: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 1: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 1: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 1: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 1: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 1: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 1: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 1: [--merge-file MERGE_FILE] 1: [--vocab-extra-ids VOCAB_EXTRA_IDS] 1: [--seq-length SEQ_LENGTH] 1: [--encoder-seq-length ENCODER_SEQ_LENGTH] 1: [--decoder-seq-length DECODER_SEQ_LENGTH] 1: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 1: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 1: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 1: [--num-workers NUM_WORKERS] 1: [--valid-num-workers VALID_NUM_WORKERS] 1: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 1: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 1: [--data-impl {lazy,cached,mmap,infer}] 1: [--reset-position-ids] [--reset-attention-mask] 1: [--eod-mask-loss] [--loss-on-targets-only] 1: [--reweight-loss-based-on-position-frequency] 1: [--noise-density NOISE_DENSITY] 1: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 1: [--adlr-autoresume] 1: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 1: [--ict-head-size ICT_HEAD_SIZE] 1: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 1: [--biencoder-shared-query-context-model] 1: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 1: [--titles-data-path TITLES_DATA_PATH] 1: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 1: [--use-one-sent-docs] 1: [--evidence-data-path EVIDENCE_DATA_PATH] 1: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 1: [--retriever-score-scaling] 1: [--block-data-path BLOCK_DATA_PATH] 1: [--embedding-path EMBEDDING_PATH] 1: [--indexer-batch-size INDEXER_BATCH_SIZE] 1: [--indexer-log-interval INDEXER_LOG_INTERVAL] 1: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 1: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 1: [--log-params-norm] [--log-num-zeros-in-grad] 1: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 1: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 1: [--log-timers-to-tensorboard] 1: [--log-batch-size-to-tensorboard] 1: [--no-log-learnig-rate-to-tensorboard] 1: [--no-log-loss-scale-to-tensorboard] 1: [--log-validation-ppl-to-tensorboard] 1: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 1: [--zero-contigious-gradients] 1: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 1: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 1: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 1: [--scattered-embeddings] [--split-transformers] 1: [--memory-centric-tiled-linear] 1: [--tile-factor TILE_FACTOR] 1: [--deepspeed-activation-checkpointing] 1: [--partition-activations] [--contigious-checkpointing] 1: [--checkpoint-in-cpu] [--synchronize-each-layer] 1: [--profile-backward] [--deepspeed] 1: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 1: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 1: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 1: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 1: [--hidden-size HIDDEN_SIZE] 1: [--ffn-hidden-size FFN_HIDDEN_SIZE] 1: [--num-attention-heads NUM_ATTENTION_HEADS] 1: [--kv-channels KV_CHANNELS] 1: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 1: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 1: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 1: [--layernorm-epsilon LAYERNORM_EPSILON] 1: [--sync-tp-duplicated-parameters] 1: [--apply-residual-connection-post-layernorm] 1: [--embed-layernorm] [--openai-gelu] 1: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 1: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 1: [--glu-activation {geglu,liglu,reglu,swiglu}] 1: [--kill-switch-path KILL_SWITCH_PATH] 1: [--log-level {debug,info,warning,error,critical}] 1: [--log-level-replica {debug,info,warning,error,critical}] 1: [--attention-dropout ATTENTION_DROPOUT] 1: [--hidden-dropout HIDDEN_DROPOUT] 1: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 1: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 1: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 1: [--micro-batch-size MICRO_BATCH_SIZE] 1: [--batch-size BATCH_SIZE] 1: [--global-batch-size GLOBAL_BATCH_SIZE] 1: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 1: [--checkpoint-activations] 1: [--distribute-checkpointed-activations] 1: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 1: [--train-iters TRAIN_ITERS] 1: [--train-samples TRAIN_SAMPLES] 1: [--train-tokens TRAIN_TOKENS] 1: [--log-interval LOG_INTERVAL] 1: [--exit-interval EXIT_INTERVAL] 1: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 1: [--tensorboard-dir TENSORBOARD_DIR] 1: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 1: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 1: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 1: [--use-bnb-optimizer] 1: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 1: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 1: [--eval-only EVAL_ONLY] 1: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 1: [--inference] 1: [--abort-on-unmet-fused-kernel-constraints] 1: [--pp-partition-method PP_PARTITION_METHOD] 1: [--seed SEED] [--init-method-std INIT_METHOD_STD] 1: [--init-method-xavier-uniform] [--lr LR] 1: [--lr-decay-style {constant,linear,cosine}] 1: [--lr-decay-iters LR_DECAY_ITERS] 1: [--lr-decay-samples LR_DECAY_SAMPLES] 1: [--lr-decay-tokens LR_DECAY_TOKENS] 1: [--lr-warmup-fraction LR_WARMUP_FRACTION] 1: [--lr-warmup-iters LR_WARMUP_ITERS] 1: [--lr-warmup-samples LR_WARMUP_SAMPLES] 1: [--warmup WARMUP] [--min-lr MIN_LR] 1: [--override-lr-scheduler] 1: [--use-checkpoint-lr-scheduler] 1: [--universal-checkpoint] [--save SAVE] 1: [--save-interval SAVE_INTERVAL] [--no-save-optim] 1: [--no-save-rng] [--load LOAD] [--no-load-optim] 1: [--no-load-rng] [--finetune] [--fp16] [--bf16] 1: [--loss-scale LOSS_SCALE] 1: [--initial-loss-scale INITIAL_LOSS_SCALE] 1: [--min-loss-scale MIN_LOSS_SCALE] 1: [--loss-scale-window LOSS_SCALE_WINDOW] 1: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 1: [--no-query-key-layer-scaling] 1: [--attention-softmax-in-fp32] 1: [--accumulate-allreduce-grads-in-fp32] 1: [--fp16-lm-cross-entropy] 1: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 1: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 1: [--model-parallel-size MODEL_PARALLEL_SIZE] 1: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 1: [--distributed-backend {nccl,gloo}] 1: [--DDP-impl {local,torch}] 1: [--use-contiguous-buffers-in-ddp] 1: [--no-scatter-gather-tensors-in-pipeline] 1: [--local_rank LOCAL_RANK] 1: [--lazy-mpu-init LAZY_MPU_INIT] 1: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 1: [--eval-interval EVAL_INTERVAL] 1: [--data-path [DATA_PATH ...]] [--split SPLIT] 1: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 1: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 1: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 1: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 1: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 1: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 1: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 1: [--merge-file MERGE_FILE] 1: [--vocab-extra-ids VOCAB_EXTRA_IDS] 1: [--seq-length SEQ_LENGTH] 1: [--encoder-seq-length ENCODER_SEQ_LENGTH] 1: [--decoder-seq-length DECODER_SEQ_LENGTH] 1: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 1: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 1: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 1: [--num-workers NUM_WORKERS] 1: [--valid-num-workers VALID_NUM_WORKERS] 1: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 1: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 1: [--data-impl {lazy,cached,mmap,infer}] 1: [--reset-position-ids] [--reset-attention-mask] 1: [--eod-mask-loss] [--loss-on-targets-only] 1: [--reweight-loss-based-on-position-frequency] 1: [--noise-density NOISE_DENSITY] 1: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 1: [--adlr-autoresume] 1: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 1: [--ict-head-size ICT_HEAD_SIZE] 1: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 1: [--biencoder-shared-query-context-model] 1: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 1: [--titles-data-path TITLES_DATA_PATH] 1: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 1: [--use-one-sent-docs] 1: [--evidence-data-path EVIDENCE_DATA_PATH] 1: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 1: [--retriever-score-scaling] 1: [--block-data-path BLOCK_DATA_PATH] 1: [--embedding-path EMBEDDING_PATH] 1: [--indexer-batch-size INDEXER_BATCH_SIZE] 1: [--indexer-log-interval INDEXER_LOG_INTERVAL] 1: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 1: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 1: [--log-params-norm] [--log-num-zeros-in-grad] 1: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 1: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 1: [--log-timers-to-tensorboard] 1: [--log-batch-size-to-tensorboard] 1: [--no-log-learnig-rate-to-tensorboard] 1: [--no-log-loss-scale-to-tensorboard] 1: [--log-validation-ppl-to-tensorboard] 1: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 1: [--zero-contigious-gradients] 1: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 1: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 1: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 1: [--scattered-embeddings] [--split-transformers] 1: [--memory-centric-tiled-linear] 1: [--tile-factor TILE_FACTOR] 1: [--deepspeed-activation-checkpointing] 1: [--partition-activations] [--contigious-checkpointing] 1: [--checkpoint-in-cpu] [--synchronize-each-layer] 1: [--profile-backward] [--deepspeed] 1: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 1: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 1: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 1: [--hidden-size HIDDEN_SIZE] 1: [--ffn-hidden-size FFN_HIDDEN_SIZE] 1: [--num-attention-heads NUM_ATTENTION_HEADS] 1: [--kv-channels KV_CHANNELS] 1: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 1: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 1: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 1: [--layernorm-epsilon LAYERNORM_EPSILON] 1: [--sync-tp-duplicated-parameters] 1: [--apply-residual-connection-post-layernorm] 1: [--embed-layernorm] [--openai-gelu] 1: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 1: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 1: [--glu-activation {geglu,liglu,reglu,swiglu}] 1: [--kill-switch-path KILL_SWITCH_PATH] 1: [--log-level {debug,info,warning,error,critical}] 1: [--log-level-replica {debug,info,warning,error,critical}] 1: [--attention-dropout ATTENTION_DROPOUT] 1: [--hidden-dropout HIDDEN_DROPOUT] 1: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 1: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 1: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 1: [--micro-batch-size MICRO_BATCH_SIZE] 1: [--batch-size BATCH_SIZE] 1: [--global-batch-size GLOBAL_BATCH_SIZE] 1: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 1: [--checkpoint-activations] 1: [--distribute-checkpointed-activations] 1: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 1: [--train-iters TRAIN_ITERS] 1: [--train-samples TRAIN_SAMPLES] 1: [--train-tokens TRAIN_TOKENS] 1: [--log-interval LOG_INTERVAL] 1: [--exit-interval EXIT_INTERVAL] 1: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 1: [--tensorboard-dir TENSORBOARD_DIR] 1: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 1: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 1: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 1: [--use-bnb-optimizer] 1: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 1: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 1: [--eval-only EVAL_ONLY] 1: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 1: [--inference] 1: [--abort-on-unmet-fused-kernel-constraints] 1: [--pp-partition-method PP_PARTITION_METHOD] 1: [--seed SEED] [--init-method-std INIT_METHOD_STD] 1: [--init-method-xavier-uniform] [--lr LR] 1: [--lr-decay-style {constant,linear,cosine}] 1: [--lr-decay-iters LR_DECAY_ITERS] 1: [--lr-decay-samples LR_DECAY_SAMPLES] 1: [--lr-decay-tokens LR_DECAY_TOKENS] 1: [--lr-warmup-fraction LR_WARMUP_FRACTION] 1: [--lr-warmup-iters LR_WARMUP_ITERS] 1: [--lr-warmup-samples LR_WARMUP_SAMPLES] 1: [--warmup WARMUP] [--min-lr MIN_LR] 1: [--override-lr-scheduler] 1: [--use-checkpoint-lr-scheduler] 1: [--universal-checkpoint] [--save SAVE] 1: [--save-interval SAVE_INTERVAL] [--no-save-optim] 1: [--no-save-rng] [--load LOAD] [--no-load-optim] 1: [--no-load-rng] [--finetune] [--fp16] [--bf16] 1: [--loss-scale LOSS_SCALE] 1: [--initial-loss-scale INITIAL_LOSS_SCALE] 1: [--min-loss-scale MIN_LOSS_SCALE] 1: [--loss-scale-window LOSS_SCALE_WINDOW] 1: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 1: [--no-query-key-layer-scaling] 1: [--attention-softmax-in-fp32] 1: [--accumulate-allreduce-grads-in-fp32] 1: [--fp16-lm-cross-entropy] 1: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 1: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 1: [--model-parallel-size MODEL_PARALLEL_SIZE] 1: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 1: [--distributed-backend {nccl,gloo}] 1: [--DDP-impl {local,torch}] 1: [--use-contiguous-buffers-in-ddp] 1: [--no-scatter-gather-tensors-in-pipeline] 1: [--local_rank LOCAL_RANK] 1: [--lazy-mpu-init LAZY_MPU_INIT] 1: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 1: [--eval-interval EVAL_INTERVAL] 1: [--data-path [DATA_PATH ...]] [--split SPLIT] 1: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 1: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 1: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 1: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 1: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 1: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 1: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 1: [--merge-file MERGE_FILE] 1: [--vocab-extra-ids VOCAB_EXTRA_IDS] 1: [--seq-length SEQ_LENGTH] 1: [--encoder-seq-length ENCODER_SEQ_LENGTH] 1: [--decoder-seq-length DECODER_SEQ_LENGTH] 1: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 1: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 1: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 1: [--num-workers NUM_WORKERS] 1: [--valid-num-workers VALID_NUM_WORKERS] 1: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 1: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 1: [--data-impl {lazy,cached,mmap,infer}] 1: [--reset-position-ids] [--reset-attention-mask] 1: [--eod-mask-loss] [--loss-on-targets-only] 1: [--reweight-loss-based-on-position-frequency] 1: [--noise-density NOISE_DENSITY] 1: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 1: [--adlr-autoresume] 1: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 1: [--ict-head-size ICT_HEAD_SIZE] 1: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 1: [--biencoder-shared-query-context-model] 1: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 1: [--titles-data-path TITLES_DATA_PATH] 1: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 1: [--use-one-sent-docs] 1: [--evidence-data-path EVIDENCE_DATA_PATH] 1: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 1: [--retriever-score-scaling] 1: [--block-data-path BLOCK_DATA_PATH] 1: [--embedding-path EMBEDDING_PATH] 1: [--indexer-batch-size INDEXER_BATCH_SIZE] 1: [--indexer-log-interval INDEXER_LOG_INTERVAL] 1: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 1: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 1: [--log-params-norm] [--log-num-zeros-in-grad] 1: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 1: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 1: [--log-timers-to-tensorboard] 1: [--log-batch-size-to-tensorboard] 1: [--no-log-learnig-rate-to-tensorboard] 1: [--no-log-loss-scale-to-tensorboard] 1: [--log-validation-ppl-to-tensorboard] 1: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 1: [--zero-contigious-gradients] 1: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 1: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 1: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 1: [--scattered-embeddings] [--split-transformers] 1: [--memory-centric-tiled-linear] 1: [--tile-factor TILE_FACTOR] 1: [--deepspeed-activation-checkpointing] 1: [--partition-activations] [--contigious-checkpointing] 1: [--checkpoint-in-cpu] [--synchronize-each-layer] 1: [--profile-backward] [--deepspeed] 1: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 1: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 1: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 1: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 1: [--hidden-size HIDDEN_SIZE] 1: [--ffn-hidden-size FFN_HIDDEN_SIZE] 1: [--num-attention-heads NUM_ATTENTION_HEADS] 1: [--kv-channels KV_CHANNELS] 1: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 1: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 1: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 1: [--layernorm-epsilon LAYERNORM_EPSILON] 1: [--sync-tp-duplicated-parameters] 1: [--apply-residual-connection-post-layernorm] 1: [--embed-layernorm] [--openai-gelu] 1: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 1: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 1: [--glu-activation {geglu,liglu,reglu,swiglu}] 1: [--kill-switch-path KILL_SWITCH_PATH] 1: [--log-level {debug,info,warning,error,critical}] 1: [--log-level-replica {debug,info,warning,error,critical}] 1: [--attention-dropout ATTENTION_DROPOUT] 1: [--hidden-dropout HIDDEN_DROPOUT] 1: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 1: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 1: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 1: [--micro-batch-size MICRO_BATCH_SIZE] 1: [--batch-size BATCH_SIZE] 1: [--global-batch-size GLOBAL_BATCH_SIZE] 1: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 1: [--checkpoint-activations] 1: [--distribute-checkpointed-activations] 1: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 1: [--train-iters TRAIN_ITERS] 1: [--train-samples TRAIN_SAMPLES] 1: [--train-tokens TRAIN_TOKENS] 1: [--log-interval LOG_INTERVAL] 1: [--exit-interval EXIT_INTERVAL] 1: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 1: [--tensorboard-dir TENSORBOARD_DIR] 1: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 1: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 1: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 1: [--use-bnb-optimizer] 1: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 1: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 1: [--eval-only EVAL_ONLY] 1: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 1: [--inference] 1: [--abort-on-unmet-fused-kernel-constraints] 1: [--pp-partition-method PP_PARTITION_METHOD] 1: [--seed SEED] [--init-method-std INIT_METHOD_STD] 1: [--init-method-xavier-uniform] [--lr LR] 1: [--lr-decay-style {constant,linear,cosine}] 1: [--lr-decay-iters LR_DECAY_ITERS] 1: [--lr-decay-samples LR_DECAY_SAMPLES] 1: [--lr-decay-tokens LR_DECAY_TOKENS] 1: [--lr-warmup-fraction LR_WARMUP_FRACTION] 1: [--lr-warmup-iters LR_WARMUP_ITERS] 1: [--lr-warmup-samples LR_WARMUP_SAMPLES] 1: [--warmup WARMUP] [--min-lr MIN_LR] 1: [--override-lr-scheduler] 1: [--use-checkpoint-lr-scheduler] 1: [--universal-checkpoint] [--save SAVE] 1: [--save-interval SAVE_INTERVAL] [--no-save-optim] 1: [--no-save-rng] [--load LOAD] [--no-load-optim] 1: [--no-load-rng] [--finetune] [--fp16] [--bf16] 1: [--loss-scale LOSS_SCALE] 1: [--initial-loss-scale INITIAL_LOSS_SCALE] 1: [--min-loss-scale MIN_LOSS_SCALE] 1: [--loss-scale-window LOSS_SCALE_WINDOW] 1: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 1: [--no-query-key-layer-scaling] 1: [--attention-softmax-in-fp32] 1: [--accumulate-allreduce-grads-in-fp32] 1: [--fp16-lm-cross-entropy] 1: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 1: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 1: [--model-parallel-size MODEL_PARALLEL_SIZE] 1: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 1: [--distributed-backend {nccl,gloo}] 1: [--DDP-impl {local,torch}] 1: [--use-contiguous-buffers-in-ddp] 1: [--no-scatter-gather-tensors-in-pipeline] 1: [--local_rank LOCAL_RANK] 1: [--lazy-mpu-init LAZY_MPU_INIT] 1: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 1: [--eval-interval EVAL_INTERVAL] 1: [--data-path [DATA_PATH ...]] [--split SPLIT] 1: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 1: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 1: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 1: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 1: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 1: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 1: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 1: [--merge-file MERGE_FILE] 1: [--vocab-extra-ids VOCAB_EXTRA_IDS] 1: [--seq-length SEQ_LENGTH] 1: [--encoder-seq-length ENCODER_SEQ_LENGTH] 1: [--decoder-seq-length DECODER_SEQ_LENGTH] 1: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 1: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 1: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 1: [--num-workers NUM_WORKERS] 1: [--valid-num-workers VALID_NUM_WORKERS] 1: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 1: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 1: [--data-impl {lazy,cached,mmap,infer}] 1: [--reset-position-ids] [--reset-attention-mask] 1: [--eod-mask-loss] [--loss-on-targets-only] 1: [--reweight-loss-based-on-position-frequency] 1: [--noise-density NOISE_DENSITY] 1: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 1: [--adlr-autoresume] 1: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 1: [--ict-head-size ICT_HEAD_SIZE] 1: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 1: [--biencoder-shared-query-context-model] 1: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 1: [--titles-data-path TITLES_DATA_PATH] 1: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 1: [--use-one-sent-docs] 1: [--evidence-data-path EVIDENCE_DATA_PATH] 1: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 1: [--retriever-score-scaling] 1: [--block-data-path BLOCK_DATA_PATH] 1: [--embedding-path EMBEDDING_PATH] 1: [--indexer-batch-size INDEXER_BATCH_SIZE] 1: [--indexer-log-interval INDEXER_LOG_INTERVAL] 1: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 1: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 1: [--log-params-norm] [--log-num-zeros-in-grad] 1: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 1: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 1: [--log-timers-to-tensorboard] 1: [--log-batch-size-to-tensorboard] 1: [--no-log-learnig-rate-to-tensorboard] 1: [--no-log-loss-scale-to-tensorboard] 1: [--log-validation-ppl-to-tensorboard] 1: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 1: [--zero-contigious-gradients] 1: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 1: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 1: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 1: [--scattered-embeddings] [--split-transformers] 1: [--memory-centric-tiled-linear] 1: [--tile-factor TILE_FACTOR] 1: [--deepspeed-activation-checkpointing] 1: [--partition-activations] [--contigious-checkpointing] 1: [--checkpoint-in-cpu] [--synchronize-each-layer] 1: [--profile-backward] [--deepspeed] 1: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 1: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 1: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 2: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 2: [--hidden-size HIDDEN_SIZE] 2: [--ffn-hidden-size FFN_HIDDEN_SIZE] 2: [--num-attention-heads NUM_ATTENTION_HEADS] 2: [--kv-channels KV_CHANNELS] 2: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 2: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 2: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 2: [--layernorm-epsilon LAYERNORM_EPSILON] 2: [--sync-tp-duplicated-parameters] 2: [--apply-residual-connection-post-layernorm] 2: [--embed-layernorm] [--openai-gelu] 2: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 2: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 2: [--glu-activation {geglu,liglu,reglu,swiglu}] 2: [--kill-switch-path KILL_SWITCH_PATH] 2: [--log-level {debug,info,warning,error,critical}] 2: [--log-level-replica {debug,info,warning,error,critical}] 2: [--attention-dropout ATTENTION_DROPOUT] 2: [--hidden-dropout HIDDEN_DROPOUT] 2: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 2: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 2: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 2: [--micro-batch-size MICRO_BATCH_SIZE] 2: [--batch-size BATCH_SIZE] 2: [--global-batch-size GLOBAL_BATCH_SIZE] 2: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 2: [--checkpoint-activations] 2: [--distribute-checkpointed-activations] 2: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 2: [--train-iters TRAIN_ITERS] 2: [--train-samples TRAIN_SAMPLES] 2: [--train-tokens TRAIN_TOKENS] 2: [--log-interval LOG_INTERVAL] 2: [--exit-interval EXIT_INTERVAL] 2: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 2: [--tensorboard-dir TENSORBOARD_DIR] 2: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 2: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 2: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 2: [--use-bnb-optimizer] 2: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 2: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 2: [--eval-only EVAL_ONLY] 2: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 2: [--inference] 2: [--abort-on-unmet-fused-kernel-constraints] 2: [--pp-partition-method PP_PARTITION_METHOD] 2: [--seed SEED] [--init-method-std INIT_METHOD_STD] 2: [--init-method-xavier-uniform] [--lr LR] 2: [--lr-decay-style {constant,linear,cosine}] 2: [--lr-decay-iters LR_DECAY_ITERS] 2: [--lr-decay-samples LR_DECAY_SAMPLES] 2: [--lr-decay-tokens LR_DECAY_TOKENS] 2: [--lr-warmup-fraction LR_WARMUP_FRACTION] 2: [--lr-warmup-iters LR_WARMUP_ITERS] 2: [--lr-warmup-samples LR_WARMUP_SAMPLES] 2: [--warmup WARMUP] [--min-lr MIN_LR] 2: [--override-lr-scheduler] 2: [--use-checkpoint-lr-scheduler] 2: [--universal-checkpoint] [--save SAVE] 2: [--save-interval SAVE_INTERVAL] [--no-save-optim] 2: [--no-save-rng] [--load LOAD] [--no-load-optim] 2: [--no-load-rng] [--finetune] [--fp16] [--bf16] 2: [--loss-scale LOSS_SCALE] 2: [--initial-loss-scale INITIAL_LOSS_SCALE] 2: [--min-loss-scale MIN_LOSS_SCALE] 2: [--loss-scale-window LOSS_SCALE_WINDOW] 2: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 2: [--no-query-key-layer-scaling] 2: [--attention-softmax-in-fp32] 2: [--accumulate-allreduce-grads-in-fp32] 2: [--fp16-lm-cross-entropy] 2: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 2: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 2: [--model-parallel-size MODEL_PARALLEL_SIZE] 2: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 2: [--distributed-backend {nccl,gloo}] 2: [--DDP-impl {local,torch}] 2: [--use-contiguous-buffers-in-ddp] 2: [--no-scatter-gather-tensors-in-pipeline] 2: [--local_rank LOCAL_RANK] 2: [--lazy-mpu-init LAZY_MPU_INIT] 2: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 2: [--eval-interval EVAL_INTERVAL] 2: [--data-path [DATA_PATH ...]] [--split SPLIT] 2: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 2: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 2: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 2: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 2: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 2: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 2: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 2: [--merge-file MERGE_FILE] 2: [--vocab-extra-ids VOCAB_EXTRA_IDS] 2: [--seq-length SEQ_LENGTH] 2: [--encoder-seq-length ENCODER_SEQ_LENGTH] 2: [--decoder-seq-length DECODER_SEQ_LENGTH] 2: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 2: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 2: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 2: [--num-workers NUM_WORKERS] 2: [--valid-num-workers VALID_NUM_WORKERS] 2: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 2: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 2: [--data-impl {lazy,cached,mmap,infer}] 2: [--reset-position-ids] [--reset-attention-mask] 2: [--eod-mask-loss] [--loss-on-targets-only] 2: [--reweight-loss-based-on-position-frequency] 2: [--noise-density NOISE_DENSITY] 2: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 2: [--adlr-autoresume] 2: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 2: [--ict-head-size ICT_HEAD_SIZE] 2: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 2: [--biencoder-shared-query-context-model] 2: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 2: [--titles-data-path TITLES_DATA_PATH] 2: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 2: [--use-one-sent-docs] 2: [--evidence-data-path EVIDENCE_DATA_PATH] 2: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 2: [--retriever-score-scaling] 2: [--block-data-path BLOCK_DATA_PATH] 2: [--embedding-path EMBEDDING_PATH] 2: [--indexer-batch-size INDEXER_BATCH_SIZE] 2: [--indexer-log-interval INDEXER_LOG_INTERVAL] 2: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 2: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 2: [--log-params-norm] [--log-num-zeros-in-grad] 2: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 2: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 2: [--log-timers-to-tensorboard] 2: [--log-batch-size-to-tensorboard] 2: [--no-log-learnig-rate-to-tensorboard] 2: [--no-log-loss-scale-to-tensorboard] 2: [--log-validation-ppl-to-tensorboard] 2: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 2: [--zero-contigious-gradients] 2: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 2: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 2: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 2: [--scattered-embeddings] [--split-transformers] 2: [--memory-centric-tiled-linear] 2: [--tile-factor TILE_FACTOR] 2: [--deepspeed-activation-checkpointing] 2: [--partition-activations] [--contigious-checkpointing] 2: [--checkpoint-in-cpu] [--synchronize-each-layer] 2: [--profile-backward] [--deepspeed] 2: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 2: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 2: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 2: [--hidden-size HIDDEN_SIZE] 2: [--ffn-hidden-size FFN_HIDDEN_SIZE] 2: [--num-attention-heads NUM_ATTENTION_HEADS] 2: [--kv-channels KV_CHANNELS] 2: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 2: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 2: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 2: [--layernorm-epsilon LAYERNORM_EPSILON] 2: [--sync-tp-duplicated-parameters] 2: [--apply-residual-connection-post-layernorm] 2: [--embed-layernorm] [--openai-gelu] 2: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 2: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 2: [--glu-activation {geglu,liglu,reglu,swiglu}] 2: [--kill-switch-path KILL_SWITCH_PATH] 2: [--log-level {debug,info,warning,error,critical}] 2: [--log-level-replica {debug,info,warning,error,critical}] 2: [--attention-dropout ATTENTION_DROPOUT] 2: [--hidden-dropout HIDDEN_DROPOUT] 2: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 2: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 2: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 2: [--micro-batch-size MICRO_BATCH_SIZE] 2: [--batch-size BATCH_SIZE] 2: [--global-batch-size GLOBAL_BATCH_SIZE] 2: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 2: [--checkpoint-activations] 2: [--distribute-checkpointed-activations] 2: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 2: [--train-iters TRAIN_ITERS] 2: [--train-samples TRAIN_SAMPLES] 2: [--train-tokens TRAIN_TOKENS] 2: [--log-interval LOG_INTERVAL] 2: [--exit-interval EXIT_INTERVAL] 2: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 2: [--tensorboard-dir TENSORBOARD_DIR] 2: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 2: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 2: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 2: [--use-bnb-optimizer] 2: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 2: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 2: [--eval-only EVAL_ONLY] 2: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 2: [--inference] 2: [--abort-on-unmet-fused-kernel-constraints] 2: [--pp-partition-method PP_PARTITION_METHOD] 2: [--seed SEED] [--init-method-std INIT_METHOD_STD] 2: [--init-method-xavier-uniform] [--lr LR] 2: [--lr-decay-style {constant,linear,cosine}] 2: [--lr-decay-iters LR_DECAY_ITERS] 2: [--lr-decay-samples LR_DECAY_SAMPLES] 2: [--lr-decay-tokens LR_DECAY_TOKENS] 2: [--lr-warmup-fraction LR_WARMUP_FRACTION] 2: [--lr-warmup-iters LR_WARMUP_ITERS] 2: [--lr-warmup-samples LR_WARMUP_SAMPLES] 2: [--warmup WARMUP] [--min-lr MIN_LR] 2: [--override-lr-scheduler] 2: [--use-checkpoint-lr-scheduler] 2: [--universal-checkpoint] [--save SAVE] 2: [--save-interval SAVE_INTERVAL] [--no-save-optim] 2: [--no-save-rng] [--load LOAD] [--no-load-optim] 2: [--no-load-rng] [--finetune] [--fp16] [--bf16] 2: [--loss-scale LOSS_SCALE] 2: [--initial-loss-scale INITIAL_LOSS_SCALE] 2: [--min-loss-scale MIN_LOSS_SCALE] 2: [--loss-scale-window LOSS_SCALE_WINDOW] 2: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 2: [--no-query-key-layer-scaling] 2: [--attention-softmax-in-fp32] 2: [--accumulate-allreduce-grads-in-fp32] 2: [--fp16-lm-cross-entropy] 2: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 2: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 2: [--model-parallel-size MODEL_PARALLEL_SIZE] 2: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 2: [--distributed-backend {nccl,gloo}] 2: [--DDP-impl {local,torch}] 2: [--use-contiguous-buffers-in-ddp] 2: [--no-scatter-gather-tensors-in-pipeline] 2: [--local_rank LOCAL_RANK] 2: [--lazy-mpu-init LAZY_MPU_INIT] 2: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 2: [--eval-interval EVAL_INTERVAL] 2: [--data-path [DATA_PATH ...]] [--split SPLIT] 2: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 2: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 2: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 2: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 2: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 2: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 2: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 2: [--merge-file MERGE_FILE] 2: [--vocab-extra-ids VOCAB_EXTRA_IDS] 2: [--seq-length SEQ_LENGTH] 2: [--encoder-seq-length ENCODER_SEQ_LENGTH] 2: [--decoder-seq-length DECODER_SEQ_LENGTH] 2: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 2: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 2: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 2: [--num-workers NUM_WORKERS] 2: [--valid-num-workers VALID_NUM_WORKERS] 2: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 2: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 2: [--data-impl {lazy,cached,mmap,infer}] 2: [--reset-position-ids] [--reset-attention-mask] 2: [--eod-mask-loss] [--loss-on-targets-only] 2: [--reweight-loss-based-on-position-frequency] 2: [--noise-density NOISE_DENSITY] 2: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 2: [--adlr-autoresume] 2: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 2: [--ict-head-size ICT_HEAD_SIZE] 2: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 2: [--biencoder-shared-query-context-model] 2: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 2: [--titles-data-path TITLES_DATA_PATH] 2: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 2: [--use-one-sent-docs] 2: [--evidence-data-path EVIDENCE_DATA_PATH] 2: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 2: [--retriever-score-scaling] 2: [--block-data-path BLOCK_DATA_PATH] 2: [--embedding-path EMBEDDING_PATH] 2: [--indexer-batch-size INDEXER_BATCH_SIZE] 2: [--indexer-log-interval INDEXER_LOG_INTERVAL] 2: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 2: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 2: [--log-params-norm] [--log-num-zeros-in-grad] 2: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 2: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 2: [--log-timers-to-tensorboard] 2: [--log-batch-size-to-tensorboard] 2: [--no-log-learnig-rate-to-tensorboard] 2: [--no-log-loss-scale-to-tensorboard] 2: [--log-validation-ppl-to-tensorboard] 2: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 2: [--zero-contigious-gradients] 2: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 2: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 2: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 2: [--scattered-embeddings] [--split-transformers] 2: [--memory-centric-tiled-linear] 2: [--tile-factor TILE_FACTOR] 2: [--deepspeed-activation-checkpointing] 2: [--partition-activations] [--contigious-checkpointing] 2: [--checkpoint-in-cpu] [--synchronize-each-layer] 2: [--profile-backward] [--deepspeed] 2: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 2: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 2: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 2: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 2: [--hidden-size HIDDEN_SIZE] 2: [--ffn-hidden-size FFN_HIDDEN_SIZE] 2: [--num-attention-heads NUM_ATTENTION_HEADS] 2: [--kv-channels KV_CHANNELS] 2: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 2: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 2: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 2: [--layernorm-epsilon LAYERNORM_EPSILON] 2: [--sync-tp-duplicated-parameters] 2: [--apply-residual-connection-post-layernorm] 2: [--embed-layernorm] [--openai-gelu] 2: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 2: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 2: [--glu-activation {geglu,liglu,reglu,swiglu}] 2: [--kill-switch-path KILL_SWITCH_PATH] 2: [--log-level {debug,info,warning,error,critical}] 2: [--log-level-replica {debug,info,warning,error,critical}] 2: [--attention-dropout ATTENTION_DROPOUT] 2: [--hidden-dropout HIDDEN_DROPOUT] 2: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 2: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 2: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 2: [--micro-batch-size MICRO_BATCH_SIZE] 2: [--batch-size BATCH_SIZE] 2: [--global-batch-size GLOBAL_BATCH_SIZE] 2: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 2: [--checkpoint-activations] 2: [--distribute-checkpointed-activations] 2: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 2: [--train-iters TRAIN_ITERS] 2: [--train-samples TRAIN_SAMPLES] 2: [--train-tokens TRAIN_TOKENS] 2: [--log-interval LOG_INTERVAL] 2: [--exit-interval EXIT_INTERVAL] 2: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 2: [--tensorboard-dir TENSORBOARD_DIR] 2: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 2: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 2: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 2: [--use-bnb-optimizer] 2: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 2: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 2: [--eval-only EVAL_ONLY] 2: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 2: [--inference] 2: [--abort-on-unmet-fused-kernel-constraints] 2: [--pp-partition-method PP_PARTITION_METHOD] 2: [--seed SEED] [--init-method-std INIT_METHOD_STD] 2: [--init-method-xavier-uniform] [--lr LR] 2: [--lr-decay-style {constant,linear,cosine}] 2: [--lr-decay-iters LR_DECAY_ITERS] 2: [--lr-decay-samples LR_DECAY_SAMPLES] 2: [--lr-decay-tokens LR_DECAY_TOKENS] 2: [--lr-warmup-fraction LR_WARMUP_FRACTION] 2: [--lr-warmup-iters LR_WARMUP_ITERS] 2: [--lr-warmup-samples LR_WARMUP_SAMPLES] 2: [--warmup WARMUP] [--min-lr MIN_LR] 2: [--override-lr-scheduler] 2: [--use-checkpoint-lr-scheduler] 2: [--universal-checkpoint] [--save SAVE] 2: [--save-interval SAVE_INTERVAL] [--no-save-optim] 2: [--no-save-rng] [--load LOAD] [--no-load-optim] 2: [--no-load-rng] [--finetune] [--fp16] [--bf16] 2: [--loss-scale LOSS_SCALE] 2: [--initial-loss-scale INITIAL_LOSS_SCALE] 2: [--min-loss-scale MIN_LOSS_SCALE] 2: [--loss-scale-window LOSS_SCALE_WINDOW] 2: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 2: [--no-query-key-layer-scaling] 2: [--attention-softmax-in-fp32] 2: [--accumulate-allreduce-grads-in-fp32] 2: [--fp16-lm-cross-entropy] 2: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 2: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 2: [--model-parallel-size MODEL_PARALLEL_SIZE] 2: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 2: [--distributed-backend {nccl,gloo}] 2: [--DDP-impl {local,torch}] 2: [--use-contiguous-buffers-in-ddp] 2: [--no-scatter-gather-tensors-in-pipeline] 2: [--local_rank LOCAL_RANK] 2: [--lazy-mpu-init LAZY_MPU_INIT] 2: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 2: [--eval-interval EVAL_INTERVAL] 2: [--data-path [DATA_PATH ...]] [--split SPLIT] 2: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 2: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 2: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 2: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 2: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 2: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 2: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 2: [--merge-file MERGE_FILE] 2: [--vocab-extra-ids VOCAB_EXTRA_IDS] 2: [--seq-length SEQ_LENGTH] 2: [--encoder-seq-length ENCODER_SEQ_LENGTH] 2: [--decoder-seq-length DECODER_SEQ_LENGTH] 2: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 2: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 2: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 2: [--num-workers NUM_WORKERS] 2: [--valid-num-workers VALID_NUM_WORKERS] 2: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 2: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 2: [--data-impl {lazy,cached,mmap,infer}] 2: [--reset-position-ids] [--reset-attention-mask] 2: [--eod-mask-loss] [--loss-on-targets-only] 2: [--reweight-loss-based-on-position-frequency] 2: [--noise-density NOISE_DENSITY] 2: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 2: [--adlr-autoresume] 2: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 2: [--ict-head-size ICT_HEAD_SIZE] 2: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 2: [--biencoder-shared-query-context-model] 2: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 2: [--titles-data-path TITLES_DATA_PATH] 2: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 2: [--use-one-sent-docs] 2: [--evidence-data-path EVIDENCE_DATA_PATH] 2: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 2: [--retriever-score-scaling] 2: [--block-data-path BLOCK_DATA_PATH] 2: [--embedding-path EMBEDDING_PATH] 2: [--indexer-batch-size INDEXER_BATCH_SIZE] 2: [--indexer-log-interval INDEXER_LOG_INTERVAL] 2: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 2: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 2: [--log-params-norm] [--log-num-zeros-in-grad] 2: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 2: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 2: [--log-timers-to-tensorboard] 2: [--log-batch-size-to-tensorboard] 2: [--no-log-learnig-rate-to-tensorboard] 2: [--no-log-loss-scale-to-tensorboard] 2: [--log-validation-ppl-to-tensorboard] 2: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 2: [--zero-contigious-gradients] 2: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 2: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 2: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 2: [--scattered-embeddings] [--split-transformers] 2: [--memory-centric-tiled-linear] 2: [--tile-factor TILE_FACTOR] 2: [--deepspeed-activation-checkpointing] 2: [--partition-activations] [--contigious-checkpointing] 2: [--checkpoint-in-cpu] [--synchronize-each-layer] 2: [--profile-backward] [--deepspeed] 2: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 2: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 2: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 5: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 5: [--hidden-size HIDDEN_SIZE] 5: [--ffn-hidden-size FFN_HIDDEN_SIZE] 5: [--num-attention-heads NUM_ATTENTION_HEADS] 5: [--kv-channels KV_CHANNELS] 5: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 5: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 5: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 5: [--layernorm-epsilon LAYERNORM_EPSILON] 5: [--sync-tp-duplicated-parameters] 5: [--apply-residual-connection-post-layernorm] 5: [--embed-layernorm] [--openai-gelu] 5: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 5: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 5: [--glu-activation {geglu,liglu,reglu,swiglu}] 5: [--kill-switch-path KILL_SWITCH_PATH] 5: [--log-level {debug,info,warning,error,critical}] 5: [--log-level-replica {debug,info,warning,error,critical}] 5: [--attention-dropout ATTENTION_DROPOUT] 5: [--hidden-dropout HIDDEN_DROPOUT] 5: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 5: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 5: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 5: [--micro-batch-size MICRO_BATCH_SIZE] 5: [--batch-size BATCH_SIZE] 5: [--global-batch-size GLOBAL_BATCH_SIZE] 5: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 5: [--checkpoint-activations] 5: [--distribute-checkpointed-activations] 5: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 5: [--train-iters TRAIN_ITERS] 5: [--train-samples TRAIN_SAMPLES] 5: [--train-tokens TRAIN_TOKENS] 5: [--log-interval LOG_INTERVAL] 5: [--exit-interval EXIT_INTERVAL] 5: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 5: [--tensorboard-dir TENSORBOARD_DIR] 5: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 5: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 5: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 5: [--use-bnb-optimizer] 5: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 5: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 5: [--eval-only EVAL_ONLY] 5: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 5: [--inference] 5: [--abort-on-unmet-fused-kernel-constraints] 5: [--pp-partition-method PP_PARTITION_METHOD] 5: [--seed SEED] [--init-method-std INIT_METHOD_STD] 5: [--init-method-xavier-uniform] [--lr LR] 5: [--lr-decay-style {constant,linear,cosine}] 5: [--lr-decay-iters LR_DECAY_ITERS] 5: [--lr-decay-samples LR_DECAY_SAMPLES] 5: [--lr-decay-tokens LR_DECAY_TOKENS] 5: [--lr-warmup-fraction LR_WARMUP_FRACTION] 5: [--lr-warmup-iters LR_WARMUP_ITERS] 5: [--lr-warmup-samples LR_WARMUP_SAMPLES] 5: [--warmup WARMUP] [--min-lr MIN_LR] 5: [--override-lr-scheduler] 5: [--use-checkpoint-lr-scheduler] 5: [--universal-checkpoint] [--save SAVE] 5: [--save-interval SAVE_INTERVAL] [--no-save-optim] 5: [--no-save-rng] [--load LOAD] [--no-load-optim] 5: [--no-load-rng] [--finetune] [--fp16] [--bf16] 5: [--loss-scale LOSS_SCALE] 5: [--initial-loss-scale INITIAL_LOSS_SCALE] 5: [--min-loss-scale MIN_LOSS_SCALE] 5: [--loss-scale-window LOSS_SCALE_WINDOW] 5: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 5: [--no-query-key-layer-scaling] 5: [--attention-softmax-in-fp32] 5: [--accumulate-allreduce-grads-in-fp32] 5: [--fp16-lm-cross-entropy] 5: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 5: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 5: [--model-parallel-size MODEL_PARALLEL_SIZE] 5: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 5: [--distributed-backend {nccl,gloo}] 5: [--DDP-impl {local,torch}] 5: [--use-contiguous-buffers-in-ddp] 5: [--no-scatter-gather-tensors-in-pipeline] 5: [--local_rank LOCAL_RANK] 5: [--lazy-mpu-init LAZY_MPU_INIT] 5: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 5: [--eval-interval EVAL_INTERVAL] 5: [--data-path [DATA_PATH ...]] [--split SPLIT] 5: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 5: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 5: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 5: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 5: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 5: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 5: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 5: [--merge-file MERGE_FILE] 5: [--vocab-extra-ids VOCAB_EXTRA_IDS] 5: [--seq-length SEQ_LENGTH] 5: [--encoder-seq-length ENCODER_SEQ_LENGTH] 5: [--decoder-seq-length DECODER_SEQ_LENGTH] 5: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 5: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 5: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 5: [--num-workers NUM_WORKERS] 5: [--valid-num-workers VALID_NUM_WORKERS] 5: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 5: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 5: [--data-impl {lazy,cached,mmap,infer}] 5: [--reset-position-ids] [--reset-attention-mask] 5: [--eod-mask-loss] [--loss-on-targets-only] 5: [--reweight-loss-based-on-position-frequency] 5: [--noise-density NOISE_DENSITY] 5: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 5: [--adlr-autoresume] 5: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 5: [--ict-head-size ICT_HEAD_SIZE] 5: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 5: [--biencoder-shared-query-context-model] 5: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 5: [--titles-data-path TITLES_DATA_PATH] 5: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 5: [--use-one-sent-docs] 5: [--evidence-data-path EVIDENCE_DATA_PATH] 5: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 5: [--retriever-score-scaling] 5: [--block-data-path BLOCK_DATA_PATH] 5: [--embedding-path EMBEDDING_PATH] 5: [--indexer-batch-size INDEXER_BATCH_SIZE] 5: [--indexer-log-interval INDEXER_LOG_INTERVAL] 5: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 5: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 5: [--log-params-norm] [--log-num-zeros-in-grad] 5: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 5: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 5: [--log-timers-to-tensorboard] 5: [--log-batch-size-to-tensorboard] 5: [--no-log-learnig-rate-to-tensorboard] 5: [--no-log-loss-scale-to-tensorboard] 5: [--log-validation-ppl-to-tensorboard] 5: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 5: [--zero-contigious-gradients] 5: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 5: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 5: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 5: [--scattered-embeddings] [--split-transformers] 5: [--memory-centric-tiled-linear] 5: [--tile-factor TILE_FACTOR] 5: [--deepspeed-activation-checkpointing] 5: [--partition-activations] [--contigious-checkpointing] 5: [--checkpoint-in-cpu] [--synchronize-each-layer] 5: [--profile-backward] [--deepspeed] 5: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 5: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 5: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 2: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 2: [--hidden-size HIDDEN_SIZE] 2: [--ffn-hidden-size FFN_HIDDEN_SIZE] 2: [--num-attention-heads NUM_ATTENTION_HEADS] 2: [--kv-channels KV_CHANNELS] 2: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 2: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 2: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 2: [--layernorm-epsilon LAYERNORM_EPSILON] 2: [--sync-tp-duplicated-parameters] 2: [--apply-residual-connection-post-layernorm] 2: [--embed-layernorm] [--openai-gelu] 2: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 2: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 2: [--glu-activation {geglu,liglu,reglu,swiglu}] 2: [--kill-switch-path KILL_SWITCH_PATH] 2: [--log-level {debug,info,warning,error,critical}] 2: [--log-level-replica {debug,info,warning,error,critical}] 2: [--attention-dropout ATTENTION_DROPOUT] 2: [--hidden-dropout HIDDEN_DROPOUT] 2: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 2: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 2: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 2: [--micro-batch-size MICRO_BATCH_SIZE] 2: [--batch-size BATCH_SIZE] 2: [--global-batch-size GLOBAL_BATCH_SIZE] 2: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 2: [--checkpoint-activations] 2: [--distribute-checkpointed-activations] 2: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 2: [--train-iters TRAIN_ITERS] 2: [--train-samples TRAIN_SAMPLES] 2: [--train-tokens TRAIN_TOKENS] 2: [--log-interval LOG_INTERVAL] 2: [--exit-interval EXIT_INTERVAL] 2: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 2: [--tensorboard-dir TENSORBOARD_DIR] 2: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 2: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 2: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 2: [--use-bnb-optimizer] 2: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 2: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 2: [--eval-only EVAL_ONLY] 2: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 2: [--inference] 2: [--abort-on-unmet-fused-kernel-constraints] 2: [--pp-partition-method PP_PARTITION_METHOD] 2: [--seed SEED] [--init-method-std INIT_METHOD_STD] 2: [--init-method-xavier-uniform] [--lr LR] 2: [--lr-decay-style {constant,linear,cosine}] 2: [--lr-decay-iters LR_DECAY_ITERS] 2: [--lr-decay-samples LR_DECAY_SAMPLES] 2: [--lr-decay-tokens LR_DECAY_TOKENS] 2: [--lr-warmup-fraction LR_WARMUP_FRACTION] 2: [--lr-warmup-iters LR_WARMUP_ITERS] 2: [--lr-warmup-samples LR_WARMUP_SAMPLES] 2: [--warmup WARMUP] [--min-lr MIN_LR] 2: [--override-lr-scheduler] 2: [--use-checkpoint-lr-scheduler] 2: [--universal-checkpoint] [--save SAVE] 2: [--save-interval SAVE_INTERVAL] [--no-save-optim] 2: [--no-save-rng] [--load LOAD] [--no-load-optim] 2: [--no-load-rng] [--finetune] [--fp16] [--bf16] 2: [--loss-scale LOSS_SCALE] 2: [--initial-loss-scale INITIAL_LOSS_SCALE] 2: [--min-loss-scale MIN_LOSS_SCALE] 2: [--loss-scale-window LOSS_SCALE_WINDOW] 2: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 2: [--no-query-key-layer-scaling] 2: [--attention-softmax-in-fp32] 2: [--accumulate-allreduce-grads-in-fp32] 2: [--fp16-lm-cross-entropy] 2: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 2: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 2: [--model-parallel-size MODEL_PARALLEL_SIZE] 2: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 2: [--distributed-backend {nccl,gloo}] 2: [--DDP-impl {local,torch}] 2: [--use-contiguous-buffers-in-ddp] 2: [--no-scatter-gather-tensors-in-pipeline] 2: [--local_rank LOCAL_RANK] 2: [--lazy-mpu-init LAZY_MPU_INIT] 2: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 2: [--eval-interval EVAL_INTERVAL] 2: [--data-path [DATA_PATH ...]] [--split SPLIT] 2: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 2: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 2: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 2: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 2: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 2: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 2: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 2: [--merge-file MERGE_FILE] 2: [--vocab-extra-ids VOCAB_EXTRA_IDS] 2: [--seq-length SEQ_LENGTH] 2: [--encoder-seq-length ENCODER_SEQ_LENGTH] 2: [--decoder-seq-length DECODER_SEQ_LENGTH] 2: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 2: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 2: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 2: [--num-workers NUM_WORKERS] 2: [--valid-num-workers VALID_NUM_WORKERS] 2: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 2: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 2: [--data-impl {lazy,cached,mmap,infer}] 2: [--reset-position-ids] [--reset-attention-mask] 2: [--eod-mask-loss] [--loss-on-targets-only] 2: [--reweight-loss-based-on-position-frequency] 2: [--noise-density NOISE_DENSITY] 2: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 2: [--adlr-autoresume] 2: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 2: [--ict-head-size ICT_HEAD_SIZE] 2: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 2: [--biencoder-shared-query-context-model] 2: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 2: [--titles-data-path TITLES_DATA_PATH] 2: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 2: [--use-one-sent-docs] 2: [--evidence-data-path EVIDENCE_DATA_PATH] 2: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 2: [--retriever-score-scaling] 2: [--block-data-path BLOCK_DATA_PATH] 2: [--embedding-path EMBEDDING_PATH] 2: [--indexer-batch-size INDEXER_BATCH_SIZE] 2: [--indexer-log-interval INDEXER_LOG_INTERVAL] 2: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 2: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 2: [--log-params-norm] [--log-num-zeros-in-grad] 2: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 2: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 2: [--log-timers-to-tensorboard] 2: [--log-batch-size-to-tensorboard] 2: [--no-log-learnig-rate-to-tensorboard] 2: [--no-log-loss-scale-to-tensorboard] 2: [--log-validation-ppl-to-tensorboard] 2: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 2: [--zero-contigious-gradients] 2: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 2: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 2: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 2: [--scattered-embeddings] [--split-transformers] 2: [--memory-centric-tiled-linear] 2: [--tile-factor TILE_FACTOR] 2: [--deepspeed-activation-checkpointing] 2: [--partition-activations] [--contigious-checkpointing] 2: [--checkpoint-in-cpu] [--synchronize-each-layer] 2: [--profile-backward] [--deepspeed] 2: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 2: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 2: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 2: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 2: [--hidden-size HIDDEN_SIZE] 2: [--ffn-hidden-size FFN_HIDDEN_SIZE] 2: [--num-attention-heads NUM_ATTENTION_HEADS] 2: [--kv-channels KV_CHANNELS] 2: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 2: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 2: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 2: [--layernorm-epsilon LAYERNORM_EPSILON] 2: [--sync-tp-duplicated-parameters] 2: [--apply-residual-connection-post-layernorm] 2: [--embed-layernorm] [--openai-gelu] 2: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 2: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 2: [--glu-activation {geglu,liglu,reglu,swiglu}] 2: [--kill-switch-path KILL_SWITCH_PATH] 2: [--log-level {debug,info,warning,error,critical}] 2: [--log-level-replica {debug,info,warning,error,critical}] 2: [--attention-dropout ATTENTION_DROPOUT] 2: [--hidden-dropout HIDDEN_DROPOUT] 2: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 2: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 2: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 2: [--micro-batch-size MICRO_BATCH_SIZE] 2: [--batch-size BATCH_SIZE] 2: [--global-batch-size GLOBAL_BATCH_SIZE] 2: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 2: [--checkpoint-activations] 2: [--distribute-checkpointed-activations] 2: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 2: [--train-iters TRAIN_ITERS] 2: [--train-samples TRAIN_SAMPLES] 2: [--train-tokens TRAIN_TOKENS] 2: [--log-interval LOG_INTERVAL] 2: [--exit-interval EXIT_INTERVAL] 2: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 2: [--tensorboard-dir TENSORBOARD_DIR] 2: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 2: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 2: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 2: [--use-bnb-optimizer] 2: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 2: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 2: [--eval-only EVAL_ONLY] 2: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 2: [--inference] 2: [--abort-on-unmet-fused-kernel-constraints] 2: [--pp-partition-method PP_PARTITION_METHOD] 2: [--seed SEED] [--init-method-std INIT_METHOD_STD] 2: [--init-method-xavier-uniform] [--lr LR] 2: [--lr-decay-style {constant,linear,cosine}] 2: [--lr-decay-iters LR_DECAY_ITERS] 2: [--lr-decay-samples LR_DECAY_SAMPLES] 2: [--lr-decay-tokens LR_DECAY_TOKENS] 2: [--lr-warmup-fraction LR_WARMUP_FRACTION] 2: [--lr-warmup-iters LR_WARMUP_ITERS] 2: [--lr-warmup-samples LR_WARMUP_SAMPLES] 2: [--warmup WARMUP] [--min-lr MIN_LR] 2: [--override-lr-scheduler] 2: [--use-checkpoint-lr-scheduler] 2: [--universal-checkpoint] [--save SAVE] 2: [--save-interval SAVE_INTERVAL] [--no-save-optim] 2: [--no-save-rng] [--load LOAD] [--no-load-optim] 2: [--no-load-rng] [--finetune] [--fp16] [--bf16] 2: [--loss-scale LOSS_SCALE] 2: [--initial-loss-scale INITIAL_LOSS_SCALE] 2: [--min-loss-scale MIN_LOSS_SCALE] 2: [--loss-scale-window LOSS_SCALE_WINDOW] 2: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 2: [--no-query-key-layer-scaling] 2: [--attention-softmax-in-fp32] 2: [--accumulate-allreduce-grads-in-fp32] 2: [--fp16-lm-cross-entropy] 2: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 2: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 2: [--model-parallel-size MODEL_PARALLEL_SIZE] 2: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 2: [--distributed-backend {nccl,gloo}] 2: [--DDP-impl {local,torch}] 2: [--use-contiguous-buffers-in-ddp] 2: [--no-scatter-gather-tensors-in-pipeline] 2: [--local_rank LOCAL_RANK] 2: [--lazy-mpu-init LAZY_MPU_INIT] 2: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 2: [--eval-interval EVAL_INTERVAL] 2: [--data-path [DATA_PATH ...]] [--split SPLIT] 2: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 2: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 2: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 2: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 2: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 2: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 2: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 2: [--merge-file MERGE_FILE] 2: [--vocab-extra-ids VOCAB_EXTRA_IDS] 2: [--seq-length SEQ_LENGTH] 2: [--encoder-seq-length ENCODER_SEQ_LENGTH] 2: [--decoder-seq-length DECODER_SEQ_LENGTH] 2: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 2: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 2: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 2: [--num-workers NUM_WORKERS] 2: [--valid-num-workers VALID_NUM_WORKERS] 2: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 2: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 2: [--data-impl {lazy,cached,mmap,infer}] 2: [--reset-position-ids] [--reset-attention-mask] 2: [--eod-mask-loss] [--loss-on-targets-only] 2: [--reweight-loss-based-on-position-frequency] 2: [--noise-density NOISE_DENSITY] 2: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 2: [--adlr-autoresume] 2: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 2: [--ict-head-size ICT_HEAD_SIZE] 2: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 2: [--biencoder-shared-query-context-model] 2: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 2: [--titles-data-path TITLES_DATA_PATH] 2: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 2: [--use-one-sent-docs] 2: [--evidence-data-path EVIDENCE_DATA_PATH] 2: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 2: [--retriever-score-scaling] 2: [--block-data-path BLOCK_DATA_PATH] 2: [--embedding-path EMBEDDING_PATH] 2: [--indexer-batch-size INDEXER_BATCH_SIZE] 2: [--indexer-log-interval INDEXER_LOG_INTERVAL] 2: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 2: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 2: [--log-params-norm] [--log-num-zeros-in-grad] 2: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 2: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 2: [--log-timers-to-tensorboard] 2: [--log-batch-size-to-tensorboard] 2: [--no-log-learnig-rate-to-tensorboard] 2: [--no-log-loss-scale-to-tensorboard] 2: [--log-validation-ppl-to-tensorboard] 2: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 2: [--zero-contigious-gradients] 2: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 2: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 2: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 2: [--scattered-embeddings] [--split-transformers] 2: [--memory-centric-tiled-linear] 2: [--tile-factor TILE_FACTOR] 2: [--deepspeed-activation-checkpointing] 2: [--partition-activations] [--contigious-checkpointing] 2: [--checkpoint-in-cpu] [--synchronize-each-layer] 2: [--profile-backward] [--deepspeed] 2: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 2: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 2: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 2: [--hidden-size HIDDEN_SIZE] 2: [--ffn-hidden-size FFN_HIDDEN_SIZE] 2: [--num-attention-heads NUM_ATTENTION_HEADS] 2: [--kv-channels KV_CHANNELS] 2: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 2: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 2: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 2: [--layernorm-epsilon LAYERNORM_EPSILON] 2: [--sync-tp-duplicated-parameters] 2: [--apply-residual-connection-post-layernorm] 2: [--embed-layernorm] [--openai-gelu] 2: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 2: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 2: [--glu-activation {geglu,liglu,reglu,swiglu}] 2: [--kill-switch-path KILL_SWITCH_PATH] 2: [--log-level {debug,info,warning,error,critical}] 2: [--log-level-replica {debug,info,warning,error,critical}] 2: [--attention-dropout ATTENTION_DROPOUT] 2: [--hidden-dropout HIDDEN_DROPOUT] 2: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 2: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 2: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 2: [--micro-batch-size MICRO_BATCH_SIZE] 2: [--batch-size BATCH_SIZE] 2: [--global-batch-size GLOBAL_BATCH_SIZE] 2: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 2: [--checkpoint-activations] 2: [--distribute-checkpointed-activations] 2: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 2: [--train-iters TRAIN_ITERS] 2: [--train-samples TRAIN_SAMPLES] 2: [--train-tokens TRAIN_TOKENS] 2: [--log-interval LOG_INTERVAL] 2: [--exit-interval EXIT_INTERVAL] 2: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 2: [--tensorboard-dir TENSORBOARD_DIR] 2: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 2: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 2: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 2: [--use-bnb-optimizer] 2: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 2: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 2: [--eval-only EVAL_ONLY] 2: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 2: [--inference] 2: [--abort-on-unmet-fused-kernel-constraints] 2: [--pp-partition-method PP_PARTITION_METHOD] 2: [--seed SEED] [--init-method-std INIT_METHOD_STD] 2: [--init-method-xavier-uniform] [--lr LR] 2: [--lr-decay-style {constant,linear,cosine}] 2: [--lr-decay-iters LR_DECAY_ITERS] 2: [--lr-decay-samples LR_DECAY_SAMPLES] 2: [--lr-decay-tokens LR_DECAY_TOKENS] 2: [--lr-warmup-fraction LR_WARMUP_FRACTION] 2: [--lr-warmup-iters LR_WARMUP_ITERS] 2: [--lr-warmup-samples LR_WARMUP_SAMPLES] 2: [--warmup WARMUP] [--min-lr MIN_LR] 2: [--override-lr-scheduler] 2: [--use-checkpoint-lr-scheduler] 2: [--universal-checkpoint] [--save SAVE] 2: [--save-interval SAVE_INTERVAL] [--no-save-optim] 2: [--no-save-rng] [--load LOAD] [--no-load-optim] 2: [--no-load-rng] [--finetune] [--fp16] [--bf16] 2: [--loss-scale LOSS_SCALE] 2: [--initial-loss-scale INITIAL_LOSS_SCALE] 2: [--min-loss-scale MIN_LOSS_SCALE] 2: [--loss-scale-window LOSS_SCALE_WINDOW] 2: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 2: [--no-query-key-layer-scaling] 2: [--attention-softmax-in-fp32] 2: [--accumulate-allreduce-grads-in-fp32] 2: [--fp16-lm-cross-entropy] 2: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 2: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 2: [--model-parallel-size MODEL_PARALLEL_SIZE] 2: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 2: [--distributed-backend {nccl,gloo}] 2: [--DDP-impl {local,torch}] 2: [--use-contiguous-buffers-in-ddp] 2: [--no-scatter-gather-tensors-in-pipeline] 2: [--local_rank LOCAL_RANK] 2: [--lazy-mpu-init LAZY_MPU_INIT] 2: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 2: [--eval-interval EVAL_INTERVAL] 2: [--data-path [DATA_PATH ...]] [--split SPLIT] 2: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 2: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 2: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 2: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 2: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 2: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 2: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 2: [--merge-file MERGE_FILE] 2: [--vocab-extra-ids VOCAB_EXTRA_IDS] 2: [--seq-length SEQ_LENGTH] 2: [--encoder-seq-length ENCODER_SEQ_LENGTH] 2: [--decoder-seq-length DECODER_SEQ_LENGTH] 2: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 2: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 2: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 2: [--num-workers NUM_WORKERS] 2: [--valid-num-workers VALID_NUM_WORKERS] 2: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 2: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 2: [--data-impl {lazy,cached,mmap,infer}] 2: [--reset-position-ids] [--reset-attention-mask] 2: [--eod-mask-loss] [--loss-on-targets-only] 2: [--reweight-loss-based-on-position-frequency] 2: [--noise-density NOISE_DENSITY] 2: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 2: [--adlr-autoresume] 2: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 2: [--ict-head-size ICT_HEAD_SIZE] 2: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 2: [--biencoder-shared-query-context-model] 2: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 2: [--titles-data-path TITLES_DATA_PATH] 2: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 2: [--use-one-sent-docs] 2: [--evidence-data-path EVIDENCE_DATA_PATH] 2: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 2: [--retriever-score-scaling] 2: [--block-data-path BLOCK_DATA_PATH] 2: [--embedding-path EMBEDDING_PATH] 2: [--indexer-batch-size INDEXER_BATCH_SIZE] 2: [--indexer-log-interval INDEXER_LOG_INTERVAL] 2: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 2: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 2: [--log-params-norm] [--log-num-zeros-in-grad] 2: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 2: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 2: [--log-timers-to-tensorboard] 2: [--log-batch-size-to-tensorboard] 2: [--no-log-learnig-rate-to-tensorboard] 2: [--no-log-loss-scale-to-tensorboard] 2: [--log-validation-ppl-to-tensorboard] 2: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 2: [--zero-contigious-gradients] 2: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 2: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 2: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 2: [--scattered-embeddings] [--split-transformers] 2: [--memory-centric-tiled-linear] 2: [--tile-factor TILE_FACTOR] 2: [--deepspeed-activation-checkpointing] 2: [--partition-activations] [--contigious-checkpointing] 2: [--checkpoint-in-cpu] [--synchronize-each-layer] 2: [--profile-backward] [--deepspeed] 2: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 2: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 2: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 2: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 5: [--hidden-size HIDDEN_SIZE] 5: [--ffn-hidden-size FFN_HIDDEN_SIZE] 5: [--num-attention-heads NUM_ATTENTION_HEADS] 5: [--kv-channels KV_CHANNELS] 5: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 5: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 5: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 5: [--layernorm-epsilon LAYERNORM_EPSILON] 5: [--sync-tp-duplicated-parameters] 5: [--apply-residual-connection-post-layernorm] 5: [--embed-layernorm] [--openai-gelu] 5: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 5: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 5: [--glu-activation {geglu,liglu,reglu,swiglu}] 5: [--kill-switch-path KILL_SWITCH_PATH] 5: [--log-level {debug,info,warning,error,critical}] 5: [--log-level-replica {debug,info,warning,error,critical}] 5: [--attention-dropout ATTENTION_DROPOUT] 5: [--hidden-dropout HIDDEN_DROPOUT] 5: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 5: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 5: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 5: [--micro-batch-size MICRO_BATCH_SIZE] 5: [--batch-size BATCH_SIZE] 5: [--global-batch-size GLOBAL_BATCH_SIZE] 5: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 5: [--checkpoint-activations] 5: [--distribute-checkpointed-activations] 5: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 5: [--train-iters TRAIN_ITERS] 5: [--train-samples TRAIN_SAMPLES] 5: [--train-tokens TRAIN_TOKENS] 5: [--log-interval LOG_INTERVAL] 5: [--exit-interval EXIT_INTERVAL] 5: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 5: [--tensorboard-dir TENSORBOARD_DIR] 5: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 5: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 5: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 5: [--use-bnb-optimizer] 5: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 5: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 5: [--eval-only EVAL_ONLY] 5: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 5: [--inference] 5: [--abort-on-unmet-fused-kernel-constraints] 5: [--pp-partition-method PP_PARTITION_METHOD] 5: [--seed SEED] [--init-method-std INIT_METHOD_STD] 5: [--init-method-xavier-uniform] [--lr LR] 5: [--lr-decay-style {constant,linear,cosine}] 5: [--lr-decay-iters LR_DECAY_ITERS] 5: [--lr-decay-samples LR_DECAY_SAMPLES] 5: [--lr-decay-tokens LR_DECAY_TOKENS] 5: [--lr-warmup-fraction LR_WARMUP_FRACTION] 5: [--lr-warmup-iters LR_WARMUP_ITERS] 5: [--lr-warmup-samples LR_WARMUP_SAMPLES] 5: [--warmup WARMUP] [--min-lr MIN_LR] 5: [--override-lr-scheduler] 5: [--use-checkpoint-lr-scheduler] 5: [--universal-checkpoint] [--save SAVE] 5: [--save-interval SAVE_INTERVAL] [--no-save-optim] 5: [--no-save-rng] [--load LOAD] [--no-load-optim] 5: [--no-load-rng] [--finetune] [--fp16] [--bf16] 5: [--loss-scale LOSS_SCALE] 5: [--initial-loss-scale INITIAL_LOSS_SCALE] 5: [--min-loss-scale MIN_LOSS_SCALE] 5: [--loss-scale-window LOSS_SCALE_WINDOW] 5: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 5: [--no-query-key-layer-scaling] 5: [--attention-softmax-in-fp32] 5: [--accumulate-allreduce-grads-in-fp32] 5: [--fp16-lm-cross-entropy] 5: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 5: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 5: [--model-parallel-size MODEL_PARALLEL_SIZE] 5: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 5: [--distributed-backend {nccl,gloo}] 5: [--DDP-impl {local,torch}] 5: [--use-contiguous-buffers-in-ddp] 5: [--no-scatter-gather-tensors-in-pipeline] 5: [--local_rank LOCAL_RANK] 5: [--lazy-mpu-init LAZY_MPU_INIT] 5: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 5: [--eval-interval EVAL_INTERVAL] 5: [--data-path [DATA_PATH ...]] [--split SPLIT] 5: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 5: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 5: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 5: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 5: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 5: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 5: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 5: [--merge-file MERGE_FILE] 5: [--vocab-extra-ids VOCAB_EXTRA_IDS] 5: [--seq-length SEQ_LENGTH] 5: [--encoder-seq-length ENCODER_SEQ_LENGTH] 5: [--decoder-seq-length DECODER_SEQ_LENGTH] 5: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 5: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 5: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 5: [--num-workers NUM_WORKERS] 5: [--valid-num-workers VALID_NUM_WORKERS] 5: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 5: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 5: [--data-impl {lazy,cached,mmap,infer}] 5: [--reset-position-ids] [--reset-attention-mask] 5: [--eod-mask-loss] [--loss-on-targets-only] 5: [--reweight-loss-based-on-position-frequency] 5: [--noise-density NOISE_DENSITY] 5: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 5: [--adlr-autoresume] 5: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 5: [--ict-head-size ICT_HEAD_SIZE] 5: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 5: [--biencoder-shared-query-context-model] 5: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 5: [--titles-data-path TITLES_DATA_PATH] 5: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 5: [--use-one-sent-docs] 5: [--evidence-data-path EVIDENCE_DATA_PATH] 5: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 5: [--retriever-score-scaling] 5: [--block-data-path BLOCK_DATA_PATH] 5: [--embedding-path EMBEDDING_PATH] 5: [--indexer-batch-size INDEXER_BATCH_SIZE] 5: [--indexer-log-interval INDEXER_LOG_INTERVAL] 5: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 5: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 5: [--log-params-norm] [--log-num-zeros-in-grad] 5: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 5: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 5: [--log-timers-to-tensorboard] 5: [--log-batch-size-to-tensorboard] 5: [--no-log-learnig-rate-to-tensorboard] 5: [--no-log-loss-scale-to-tensorboard] 5: [--log-validation-ppl-to-tensorboard] 5: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 5: [--zero-contigious-gradients] 5: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 5: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 5: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 5: [--scattered-embeddings] [--split-transformers] 5: [--memory-centric-tiled-linear] 5: [--tile-factor TILE_FACTOR] 5: [--deepspeed-activation-checkpointing] 5: [--partition-activations] [--contigious-checkpointing] 5: [--checkpoint-in-cpu] [--synchronize-each-layer] 5: [--profile-backward] [--deepspeed] 5: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 5: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 5: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 5: [--hidden-size HIDDEN_SIZE] 5: [--ffn-hidden-size FFN_HIDDEN_SIZE] 5: [--num-attention-heads NUM_ATTENTION_HEADS] 5: [--kv-channels KV_CHANNELS] 5: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 5: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 5: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 5: [--layernorm-epsilon LAYERNORM_EPSILON] 5: [--sync-tp-duplicated-parameters] 5: [--apply-residual-connection-post-layernorm] 5: [--embed-layernorm] [--openai-gelu] 5: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 5: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 5: [--glu-activation {geglu,liglu,reglu,swiglu}] 5: [--kill-switch-path KILL_SWITCH_PATH] 5: [--log-level {debug,info,warning,error,critical}] 5: [--log-level-replica {debug,info,warning,error,critical}] 5: [--attention-dropout ATTENTION_DROPOUT] 5: [--hidden-dropout HIDDEN_DROPOUT] 5: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 5: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 5: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 5: [--micro-batch-size MICRO_BATCH_SIZE] 5: [--batch-size BATCH_SIZE] 5: [--global-batch-size GLOBAL_BATCH_SIZE] 5: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 5: [--checkpoint-activations] 5: [--distribute-checkpointed-activations] 5: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 5: [--train-iters TRAIN_ITERS] 5: [--train-samples TRAIN_SAMPLES] 5: [--train-tokens TRAIN_TOKENS] 5: [--log-interval LOG_INTERVAL] 5: [--exit-interval EXIT_INTERVAL] 5: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 5: [--tensorboard-dir TENSORBOARD_DIR] 5: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 5: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 5: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 5: [--use-bnb-optimizer] 5: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 5: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 5: [--eval-only EVAL_ONLY] 5: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 5: [--inference] 5: [--abort-on-unmet-fused-kernel-constraints] 5: [--pp-partition-method PP_PARTITION_METHOD] 5: [--seed SEED] [--init-method-std INIT_METHOD_STD] 5: [--init-method-xavier-uniform] [--lr LR] 5: [--lr-decay-style {constant,linear,cosine}] 5: [--lr-decay-iters LR_DECAY_ITERS] 5: [--lr-decay-samples LR_DECAY_SAMPLES] 5: [--lr-decay-tokens LR_DECAY_TOKENS] 5: [--lr-warmup-fraction LR_WARMUP_FRACTION] 5: [--lr-warmup-iters LR_WARMUP_ITERS] 5: [--lr-warmup-samples LR_WARMUP_SAMPLES] 5: [--warmup WARMUP] [--min-lr MIN_LR] 5: [--override-lr-scheduler] 5: [--use-checkpoint-lr-scheduler] 5: [--universal-checkpoint] [--save SAVE] 5: [--save-interval SAVE_INTERVAL] [--no-save-optim] 5: [--no-save-rng] [--load LOAD] [--no-load-optim] 5: [--no-load-rng] [--finetune] [--fp16] [--bf16] 5: [--loss-scale LOSS_SCALE] 5: [--initial-loss-scale INITIAL_LOSS_SCALE] 5: [--min-loss-scale MIN_LOSS_SCALE] 5: [--loss-scale-window LOSS_SCALE_WINDOW] 5: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 5: [--no-query-key-layer-scaling] 5: [--attention-softmax-in-fp32] 5: [--accumulate-allreduce-grads-in-fp32] 5: [--fp16-lm-cross-entropy] 5: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 5: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 5: [--model-parallel-size MODEL_PARALLEL_SIZE] 5: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 5: [--distributed-backend {nccl,gloo}] 5: [--DDP-impl {local,torch}] 5: [--use-contiguous-buffers-in-ddp] 5: [--no-scatter-gather-tensors-in-pipeline] 5: [--local_rank LOCAL_RANK] 5: [--lazy-mpu-init LAZY_MPU_INIT] 5: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 5: [--eval-interval EVAL_INTERVAL] 5: [--data-path [DATA_PATH ...]] [--split SPLIT] 5: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 5: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 5: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 5: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 5: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 5: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 5: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 5: [--merge-file MERGE_FILE] 5: [--vocab-extra-ids VOCAB_EXTRA_IDS] 5: [--seq-length SEQ_LENGTH] 5: [--encoder-seq-length ENCODER_SEQ_LENGTH] 5: [--decoder-seq-length DECODER_SEQ_LENGTH] 5: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 5: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 5: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 5: [--num-workers NUM_WORKERS] 5: [--valid-num-workers VALID_NUM_WORKERS] 5: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 5: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 5: [--data-impl {lazy,cached,mmap,infer}] 5: [--reset-position-ids] [--reset-attention-mask] 5: [--eod-mask-loss] [--loss-on-targets-only] 5: [--reweight-loss-based-on-position-frequency] 5: [--noise-density NOISE_DENSITY] 5: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 5: [--adlr-autoresume] 5: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 5: [--ict-head-size ICT_HEAD_SIZE] 5: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 5: [--biencoder-shared-query-context-model] 5: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 5: [--titles-data-path TITLES_DATA_PATH] 5: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 5: [--use-one-sent-docs] 5: [--evidence-data-path EVIDENCE_DATA_PATH] 5: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 5: [--retriever-score-scaling] 5: [--block-data-path BLOCK_DATA_PATH] 5: [--embedding-path EMBEDDING_PATH] 5: [--indexer-batch-size INDEXER_BATCH_SIZE] 5: [--indexer-log-interval INDEXER_LOG_INTERVAL] 5: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 5: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 5: [--log-params-norm] [--log-num-zeros-in-grad] 5: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 5: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 5: [--log-timers-to-tensorboard] 5: [--log-batch-size-to-tensorboard] 5: [--no-log-learnig-rate-to-tensorboard] 5: [--no-log-loss-scale-to-tensorboard] 5: [--log-validation-ppl-to-tensorboard] 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 5: [--zero-contigious-gradients] 5: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 5: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 5: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 5: [--scattered-embeddings] [--split-transformers] 5: [--memory-centric-tiled-linear] 5: [--tile-factor TILE_FACTOR] 5: [--deepspeed-activation-checkpointing] 5: [--partition-activations] [--contigious-checkpointing] 5: [--checkpoint-in-cpu] [--synchronize-each-layer] 5: [--profile-backward] [--deepspeed] 5: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 5: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 5: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 5: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 5: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 5: [--hidden-size HIDDEN_SIZE] 5: [--ffn-hidden-size FFN_HIDDEN_SIZE] 5: [--num-attention-heads NUM_ATTENTION_HEADS] 5: [--kv-channels KV_CHANNELS] 5: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 5: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 5: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 5: [--layernorm-epsilon LAYERNORM_EPSILON] 5: [--sync-tp-duplicated-parameters] 5: [--apply-residual-connection-post-layernorm] 5: [--embed-layernorm] [--openai-gelu] 5: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 5: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 5: [--glu-activation {geglu,liglu,reglu,swiglu}] 5: [--kill-switch-path KILL_SWITCH_PATH] 5: [--log-level {debug,info,warning,error,critical}] 5: [--log-level-replica {debug,info,warning,error,critical}] 5: [--attention-dropout ATTENTION_DROPOUT] 5: [--hidden-dropout HIDDEN_DROPOUT] 5: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 5: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 5: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 5: [--micro-batch-size MICRO_BATCH_SIZE] 5: [--batch-size BATCH_SIZE] 5: [--global-batch-size GLOBAL_BATCH_SIZE] 5: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 5: [--checkpoint-activations] 5: [--distribute-checkpointed-activations] 5: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 5: [--train-iters TRAIN_ITERS] 5: [--train-samples TRAIN_SAMPLES] 5: [--train-tokens TRAIN_TOKENS] 5: [--log-interval LOG_INTERVAL] 5: [--exit-interval EXIT_INTERVAL] 5: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 5: [--tensorboard-dir TENSORBOARD_DIR] 5: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 5: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 5: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 5: [--use-bnb-optimizer] 5: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 5: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 5: [--eval-only EVAL_ONLY] 5: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 5: [--inference] 5: [--abort-on-unmet-fused-kernel-constraints] 5: [--pp-partition-method PP_PARTITION_METHOD] 5: [--seed SEED] [--init-method-std INIT_METHOD_STD] 5: [--init-method-xavier-uniform] [--lr LR] 5: [--lr-decay-style {constant,linear,cosine}] 5: [--lr-decay-iters LR_DECAY_ITERS] 5: [--lr-decay-samples LR_DECAY_SAMPLES] 5: [--lr-decay-tokens LR_DECAY_TOKENS] 5: [--lr-warmup-fraction LR_WARMUP_FRACTION] 5: [--lr-warmup-iters LR_WARMUP_ITERS] 5: [--lr-warmup-samples LR_WARMUP_SAMPLES] 5: [--warmup WARMUP] [--min-lr MIN_LR] 5: [--override-lr-scheduler] 5: [--use-checkpoint-lr-scheduler] 5: [--universal-checkpoint] [--save SAVE] 5: [--save-interval SAVE_INTERVAL] [--no-save-optim] 5: [--no-save-rng] [--load LOAD] [--no-load-optim] 5: [--no-load-rng] [--finetune] [--fp16] [--bf16] 5: [--loss-scale LOSS_SCALE] 5: [--initial-loss-scale INITIAL_LOSS_SCALE] 5: [--min-loss-scale MIN_LOSS_SCALE] 5: [--loss-scale-window LOSS_SCALE_WINDOW] 5: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 5: [--no-query-key-layer-scaling] 5: [--attention-softmax-in-fp32] 5: [--accumulate-allreduce-grads-in-fp32] 5: [--fp16-lm-cross-entropy] 5: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 5: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 5: [--model-parallel-size MODEL_PARALLEL_SIZE] 5: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 5: [--distributed-backend {nccl,gloo}] 5: [--DDP-impl {local,torch}] 5: [--use-contiguous-buffers-in-ddp] 5: [--no-scatter-gather-tensors-in-pipeline] 5: [--local_rank LOCAL_RANK] 5: [--lazy-mpu-init LAZY_MPU_INIT] 5: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 5: [--eval-interval EVAL_INTERVAL] 5: [--data-path [DATA_PATH ...]] [--split SPLIT] 5: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 5: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 5: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 5: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 5: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 5: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 5: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 5: [--merge-file MERGE_FILE] 5: [--vocab-extra-ids VOCAB_EXTRA_IDS] 5: [--seq-length SEQ_LENGTH] 5: [--encoder-seq-length ENCODER_SEQ_LENGTH] 5: [--decoder-seq-length DECODER_SEQ_LENGTH] 5: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 5: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 5: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 5: [--num-workers NUM_WORKERS] 5: [--valid-num-workers VALID_NUM_WORKERS] 5: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 5: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 5: [--data-impl {lazy,cached,mmap,infer}] 5: [--reset-position-ids] [--reset-attention-mask] 5: [--eod-mask-loss] [--loss-on-targets-only] 5: [--reweight-loss-based-on-position-frequency] 5: [--noise-density NOISE_DENSITY] 5: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 5: [--adlr-autoresume] 5: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 5: [--ict-head-size ICT_HEAD_SIZE] 5: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 5: [--biencoder-shared-query-context-model] 5: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 5: [--titles-data-path TITLES_DATA_PATH] 5: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 5: [--use-one-sent-docs] 5: [--evidence-data-path EVIDENCE_DATA_PATH] 5: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 5: [--retriever-score-scaling] 5: [--block-data-path BLOCK_DATA_PATH] 5: [--embedding-path EMBEDDING_PATH] 5: [--indexer-batch-size INDEXER_BATCH_SIZE] 5: [--indexer-log-interval INDEXER_LOG_INTERVAL] 5: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 5: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 5: [--log-params-norm] [--log-num-zeros-in-grad] 5: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 5: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 5: [--log-timers-to-tensorboard] 5: [--log-batch-size-to-tensorboard] 5: [--no-log-learnig-rate-to-tensorboard] 5: [--no-log-loss-scale-to-tensorboard] 5: [--log-validation-ppl-to-tensorboard] 5: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 5: [--zero-contigious-gradients] 5: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 5: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 5: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 5: [--scattered-embeddings] [--split-transformers] 5: [--memory-centric-tiled-linear] 5: [--tile-factor TILE_FACTOR] 5: [--deepspeed-activation-checkpointing] 5: [--partition-activations] [--contigious-checkpointing] 5: [--checkpoint-in-cpu] [--synchronize-each-layer] 5: [--profile-backward] [--deepspeed] 5: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 5: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 5: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 5: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 5: [--hidden-size HIDDEN_SIZE] 5: [--ffn-hidden-size FFN_HIDDEN_SIZE] 5: [--num-attention-heads NUM_ATTENTION_HEADS] 5: [--kv-channels KV_CHANNELS] 5: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 5: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 5: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 5: [--layernorm-epsilon LAYERNORM_EPSILON] 5: [--sync-tp-duplicated-parameters] 5: [--apply-residual-connection-post-layernorm] 5: [--embed-layernorm] [--openai-gelu] 5: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 5: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 5: [--glu-activation {geglu,liglu,reglu,swiglu}] 5: [--kill-switch-path KILL_SWITCH_PATH] 5: [--log-level {debug,info,warning,error,critical}] 5: [--log-level-replica {debug,info,warning,error,critical}] 5: [--attention-dropout ATTENTION_DROPOUT] 5: [--hidden-dropout HIDDEN_DROPOUT] 5: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 5: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 5: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 5: [--micro-batch-size MICRO_BATCH_SIZE] 5: [--batch-size BATCH_SIZE] 5: [--global-batch-size GLOBAL_BATCH_SIZE] 5: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 5: [--checkpoint-activations] 5: [--distribute-checkpointed-activations] 5: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 5: [--train-iters TRAIN_ITERS] 5: [--train-samples TRAIN_SAMPLES] 5: [--train-tokens TRAIN_TOKENS] 5: [--log-interval LOG_INTERVAL] 5: [--exit-interval EXIT_INTERVAL] 5: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 5: [--tensorboard-dir TENSORBOARD_DIR] 5: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 5: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 5: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 5: [--use-bnb-optimizer] 5: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 5: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 5: [--eval-only EVAL_ONLY] 5: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 5: [--inference] 5: [--abort-on-unmet-fused-kernel-constraints] 5: [--pp-partition-method PP_PARTITION_METHOD] 5: [--seed SEED] [--init-method-std INIT_METHOD_STD] 5: [--init-method-xavier-uniform] [--lr LR] 5: [--lr-decay-style {constant,linear,cosine}] 5: [--lr-decay-iters LR_DECAY_ITERS] 5: [--lr-decay-samples LR_DECAY_SAMPLES] 5: [--lr-decay-tokens LR_DECAY_TOKENS] 5: [--lr-warmup-fraction LR_WARMUP_FRACTION] 5: [--lr-warmup-iters LR_WARMUP_ITERS] 5: [--lr-warmup-samples LR_WARMUP_SAMPLES] 5: [--warmup WARMUP] [--min-lr MIN_LR] 5: [--override-lr-scheduler] 5: [--use-checkpoint-lr-scheduler] 5: [--universal-checkpoint] [--save SAVE] 5: [--save-interval SAVE_INTERVAL] [--no-save-optim] 5: [--no-save-rng] [--load LOAD] [--no-load-optim] 5: [--no-load-rng] [--finetune] [--fp16] [--bf16] 5: [--loss-scale LOSS_SCALE] 5: [--initial-loss-scale INITIAL_LOSS_SCALE] 5: [--min-loss-scale MIN_LOSS_SCALE] 5: [--loss-scale-window LOSS_SCALE_WINDOW] 5: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 5: [--no-query-key-layer-scaling] 5: [--attention-softmax-in-fp32] 5: [--accumulate-allreduce-grads-in-fp32] 5: [--fp16-lm-cross-entropy] 5: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 5: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 5: [--model-parallel-size MODEL_PARALLEL_SIZE] 5: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 5: [--distributed-backend {nccl,gloo}] 5: [--DDP-impl {local,torch}] 5: [--use-contiguous-buffers-in-ddp] 5: [--no-scatter-gather-tensors-in-pipeline] 5: [--local_rank LOCAL_RANK] 5: [--lazy-mpu-init LAZY_MPU_INIT] 5: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 5: [--eval-interval EVAL_INTERVAL] 5: [--data-path [DATA_PATH ...]] [--split SPLIT] 5: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 5: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 5: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 5: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 5: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 5: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 5: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 5: [--merge-file MERGE_FILE] 5: [--vocab-extra-ids VOCAB_EXTRA_IDS] 5: [--seq-length SEQ_LENGTH] 5: [--encoder-seq-length ENCODER_SEQ_LENGTH] 5: [--decoder-seq-length DECODER_SEQ_LENGTH] 5: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 5: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 5: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 5: [--num-workers NUM_WORKERS] 5: [--valid-num-workers VALID_NUM_WORKERS] 5: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 5: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 5: [--data-impl {lazy,cached,mmap,infer}] 5: [--reset-position-ids] [--reset-attention-mask] 5: [--eod-mask-loss] [--loss-on-targets-only] 5: [--reweight-loss-based-on-position-frequency] 5: [--noise-density NOISE_DENSITY] 5: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 5: [--adlr-autoresume] 5: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 5: [--ict-head-size ICT_HEAD_SIZE] 5: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 5: [--biencoder-shared-query-context-model] 5: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 5: [--titles-data-path TITLES_DATA_PATH] 5: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 5: [--use-one-sent-docs] 5: [--evidence-data-path EVIDENCE_DATA_PATH] 5: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 5: [--retriever-score-scaling] 5: [--block-data-path BLOCK_DATA_PATH] 5: [--embedding-path EMBEDDING_PATH] 5: [--indexer-batch-size INDEXER_BATCH_SIZE] 5: [--indexer-log-interval INDEXER_LOG_INTERVAL] 5: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 5: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 5: [--log-params-norm] [--log-num-zeros-in-grad] 5: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 5: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 5: [--log-timers-to-tensorboard] 5: [--log-batch-size-to-tensorboard] 5: [--no-log-learnig-rate-to-tensorboard] 5: [--no-log-loss-scale-to-tensorboard] 5: [--log-validation-ppl-to-tensorboard] 5: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 5: [--zero-contigious-gradients] 5: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 5: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 5: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 5: [--scattered-embeddings] [--split-transformers] 5: [--memory-centric-tiled-linear] 5: [--tile-factor TILE_FACTOR] 5: [--deepspeed-activation-checkpointing] 5: [--partition-activations] [--contigious-checkpointing] 5: [--checkpoint-in-cpu] [--synchronize-each-layer] 5: [--profile-backward] [--deepspeed] 5: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 5: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 5: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 5: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 5: [--hidden-size HIDDEN_SIZE] 5: [--ffn-hidden-size FFN_HIDDEN_SIZE] 5: [--num-attention-heads NUM_ATTENTION_HEADS] 5: [--kv-channels KV_CHANNELS] 5: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 5: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 5: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 5: [--layernorm-epsilon LAYERNORM_EPSILON] 5: [--sync-tp-duplicated-parameters] 5: [--apply-residual-connection-post-layernorm] 5: [--embed-layernorm] [--openai-gelu] 5: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 5: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 5: [--glu-activation {geglu,liglu,reglu,swiglu}] 5: [--kill-switch-path KILL_SWITCH_PATH] 5: [--log-level {debug,info,warning,error,critical}] 5: [--log-level-replica {debug,info,warning,error,critical}] 5: [--attention-dropout ATTENTION_DROPOUT] 5: [--hidden-dropout HIDDEN_DROPOUT] 5: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 5: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 5: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 5: [--micro-batch-size MICRO_BATCH_SIZE] 5: [--batch-size BATCH_SIZE] 5: [--global-batch-size GLOBAL_BATCH_SIZE] 5: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 5: [--checkpoint-activations] 5: [--distribute-checkpointed-activations] 5: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 5: [--train-iters TRAIN_ITERS] 5: [--train-samples TRAIN_SAMPLES] 5: [--train-tokens TRAIN_TOKENS] 5: [--log-interval LOG_INTERVAL] 5: [--exit-interval EXIT_INTERVAL] 5: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 5: [--tensorboard-dir TENSORBOARD_DIR] 5: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 5: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 5: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 5: [--use-bnb-optimizer] 5: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 5: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 5: [--eval-only EVAL_ONLY] 5: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 5: [--inference] 5: [--abort-on-unmet-fused-kernel-constraints] 5: [--pp-partition-method PP_PARTITION_METHOD] 5: [--seed SEED] [--init-method-std INIT_METHOD_STD] 5: [--init-method-xavier-uniform] [--lr LR] 5: [--lr-decay-style {constant,linear,cosine}] 5: [--lr-decay-iters LR_DECAY_ITERS] 5: [--lr-decay-samples LR_DECAY_SAMPLES] 5: [--lr-decay-tokens LR_DECAY_TOKENS] 5: [--lr-warmup-fraction LR_WARMUP_FRACTION] 5: [--lr-warmup-iters LR_WARMUP_ITERS] 5: [--lr-warmup-samples LR_WARMUP_SAMPLES] 5: [--warmup WARMUP] [--min-lr MIN_LR] 5: [--override-lr-scheduler] 5: [--use-checkpoint-lr-scheduler] 5: [--universal-checkpoint] [--save SAVE] 5: [--save-interval SAVE_INTERVAL] [--no-save-optim] 5: [--no-save-rng] [--load LOAD] [--no-load-optim] 5: [--no-load-rng] [--finetune] [--fp16] [--bf16] 5: [--loss-scale LOSS_SCALE] 5: [--initial-loss-scale INITIAL_LOSS_SCALE] 5: [--min-loss-scale MIN_LOSS_SCALE] 5: [--loss-scale-window LOSS_SCALE_WINDOW] 5: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 5: [--no-query-key-layer-scaling] 5: [--attention-softmax-in-fp32] 5: [--accumulate-allreduce-grads-in-fp32] 5: [--fp16-lm-cross-entropy] 5: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 5: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 5: [--model-parallel-size MODEL_PARALLEL_SIZE] 5: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 5: [--distributed-backend {nccl,gloo}] 5: [--DDP-impl {local,torch}] 5: [--use-contiguous-buffers-in-ddp] 5: [--no-scatter-gather-tensors-in-pipeline] 5: [--local_rank LOCAL_RANK] 5: [--lazy-mpu-init LAZY_MPU_INIT] 5: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 5: [--eval-interval EVAL_INTERVAL] 5: [--data-path [DATA_PATH ...]] [--split SPLIT] 5: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 5: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 5: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 5: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 5: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 5: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 5: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 5: [--merge-file MERGE_FILE] 5: [--vocab-extra-ids VOCAB_EXTRA_IDS] 5: [--seq-length SEQ_LENGTH] 5: [--encoder-seq-length ENCODER_SEQ_LENGTH] 5: [--decoder-seq-length DECODER_SEQ_LENGTH] 5: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 5: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 5: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 5: [--num-workers NUM_WORKERS] 5: [--valid-num-workers VALID_NUM_WORKERS] 5: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 5: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 5: [--data-impl {lazy,cached,mmap,infer}] 5: [--reset-position-ids] [--reset-attention-mask] 5: [--eod-mask-loss] [--loss-on-targets-only] 5: [--reweight-loss-based-on-position-frequency] 5: [--noise-density NOISE_DENSITY] 5: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 5: [--adlr-autoresume] 5: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 5: [--ict-head-size ICT_HEAD_SIZE] 5: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 5: [--biencoder-shared-query-context-model] 5: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 5: [--titles-data-path TITLES_DATA_PATH] 5: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 5: [--use-one-sent-docs] 5: [--evidence-data-path EVIDENCE_DATA_PATH] 5: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 5: [--retriever-score-scaling] 5: [--block-data-path BLOCK_DATA_PATH] 5: [--embedding-path EMBEDDING_PATH] 5: [--indexer-batch-size INDEXER_BATCH_SIZE] 5: [--indexer-log-interval INDEXER_LOG_INTERVAL] 5: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 5: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 5: [--log-params-norm] [--log-num-zeros-in-grad] 5: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 5: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 5: [--log-timers-to-tensorboard] 5: [--log-batch-size-to-tensorboard] 5: [--no-log-learnig-rate-to-tensorboard] 5: [--no-log-loss-scale-to-tensorboard] 5: [--log-validation-ppl-to-tensorboard] 5: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 5: [--zero-contigious-gradients] 5: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 5: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 5: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 5: [--scattered-embeddings] [--split-transformers] 5: [--memory-centric-tiled-linear] 5: [--tile-factor TILE_FACTOR] 5: [--deepspeed-activation-checkpointing] 5: [--partition-activations] [--contigious-checkpointing] 5: [--checkpoint-in-cpu] [--synchronize-each-layer] 5: [--profile-backward] [--deepspeed] 5: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 5: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 5: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 3: [--hidden-size HIDDEN_SIZE] 3: [--ffn-hidden-size FFN_HIDDEN_SIZE] 3: [--num-attention-heads NUM_ATTENTION_HEADS] 3: [--kv-channels KV_CHANNELS] 3: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 3: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 3: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 3: [--layernorm-epsilon LAYERNORM_EPSILON] 3: [--sync-tp-duplicated-parameters] 3: [--apply-residual-connection-post-layernorm] 3: [--embed-layernorm] [--openai-gelu] 3: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 3: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 3: [--glu-activation {geglu,liglu,reglu,swiglu}] 3: [--kill-switch-path KILL_SWITCH_PATH] 3: [--log-level {debug,info,warning,error,critical}] 3: [--log-level-replica {debug,info,warning,error,critical}] 3: [--attention-dropout ATTENTION_DROPOUT] 3: [--hidden-dropout HIDDEN_DROPOUT] 3: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 3: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 3: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 3: [--micro-batch-size MICRO_BATCH_SIZE] 3: [--batch-size BATCH_SIZE] 3: [--global-batch-size GLOBAL_BATCH_SIZE] 3: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 3: [--checkpoint-activations] 3: [--distribute-checkpointed-activations] 3: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 3: [--train-iters TRAIN_ITERS] 3: [--train-samples TRAIN_SAMPLES] 3: [--train-tokens TRAIN_TOKENS] 3: [--log-interval LOG_INTERVAL] 3: [--exit-interval EXIT_INTERVAL] 3: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 3: [--tensorboard-dir TENSORBOARD_DIR] 3: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 3: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 3: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 3: [--use-bnb-optimizer] 3: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 3: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 3: [--eval-only EVAL_ONLY] 3: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 3: [--inference] 3: [--abort-on-unmet-fused-kernel-constraints] 3: [--pp-partition-method PP_PARTITION_METHOD] 3: [--seed SEED] [--init-method-std INIT_METHOD_STD] 3: [--init-method-xavier-uniform] [--lr LR] 3: [--lr-decay-style {constant,linear,cosine}] 3: [--lr-decay-iters LR_DECAY_ITERS] 3: [--lr-decay-samples LR_DECAY_SAMPLES] 3: [--lr-decay-tokens LR_DECAY_TOKENS] 3: [--lr-warmup-fraction LR_WARMUP_FRACTION] 3: [--lr-warmup-iters LR_WARMUP_ITERS] 3: [--lr-warmup-samples LR_WARMUP_SAMPLES] 3: [--warmup WARMUP] [--min-lr MIN_LR] 3: [--override-lr-scheduler] 3: [--use-checkpoint-lr-scheduler] 3: [--universal-checkpoint] [--save SAVE] 3: [--save-interval SAVE_INTERVAL] [--no-save-optim] 3: [--no-save-rng] [--load LOAD] [--no-load-optim] 3: [--no-load-rng] [--finetune] [--fp16] [--bf16] 3: [--loss-scale LOSS_SCALE] 3: [--initial-loss-scale INITIAL_LOSS_SCALE] 3: [--min-loss-scale MIN_LOSS_SCALE] 3: [--loss-scale-window LOSS_SCALE_WINDOW] 3: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 3: [--no-query-key-layer-scaling] 3: [--attention-softmax-in-fp32] 3: [--accumulate-allreduce-grads-in-fp32] 3: [--fp16-lm-cross-entropy] 3: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 3: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 3: [--model-parallel-size MODEL_PARALLEL_SIZE] 3: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 3: [--distributed-backend {nccl,gloo}] 3: [--DDP-impl {local,torch}] 3: [--use-contiguous-buffers-in-ddp] 3: [--no-scatter-gather-tensors-in-pipeline] 3: [--local_rank LOCAL_RANK] 3: [--lazy-mpu-init LAZY_MPU_INIT] 3: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 3: [--eval-interval EVAL_INTERVAL] 3: [--data-path [DATA_PATH ...]] [--split SPLIT] 3: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 3: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 3: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 3: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 3: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 3: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 3: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 3: [--merge-file MERGE_FILE] 3: [--vocab-extra-ids VOCAB_EXTRA_IDS] 3: [--seq-length SEQ_LENGTH] 3: [--encoder-seq-length ENCODER_SEQ_LENGTH] 3: [--decoder-seq-length DECODER_SEQ_LENGTH] 3: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 3: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 3: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 3: [--num-workers NUM_WORKERS] 3: [--valid-num-workers VALID_NUM_WORKERS] 3: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 3: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 3: [--data-impl {lazy,cached,mmap,infer}] 3: [--reset-position-ids] [--reset-attention-mask] 3: [--eod-mask-loss] [--loss-on-targets-only] 3: [--reweight-loss-based-on-position-frequency] 3: [--noise-density NOISE_DENSITY] 3: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 3: [--adlr-autoresume] 3: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 3: [--ict-head-size ICT_HEAD_SIZE] 3: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 3: [--biencoder-shared-query-context-model] 3: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 3: [--titles-data-path TITLES_DATA_PATH] 3: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 3: [--use-one-sent-docs] 3: [--evidence-data-path EVIDENCE_DATA_PATH] 3: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 3: [--retriever-score-scaling] 3: [--block-data-path BLOCK_DATA_PATH] 3: [--embedding-path EMBEDDING_PATH] 3: [--indexer-batch-size INDEXER_BATCH_SIZE] 3: [--indexer-log-interval INDEXER_LOG_INTERVAL] 3: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 3: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 3: [--log-params-norm] [--log-num-zeros-in-grad] 3: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 3: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 3: [--log-timers-to-tensorboard] 3: [--log-batch-size-to-tensorboard] 3: [--no-log-learnig-rate-to-tensorboard] 3: [--no-log-loss-scale-to-tensorboard] 3: [--log-validation-ppl-to-tensorboard] 3: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 3: [--zero-contigious-gradients] 3: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 3: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 3: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 3: [--scattered-embeddings] [--split-transformers] 3: [--memory-centric-tiled-linear] 3: [--tile-factor TILE_FACTOR] 3: [--deepspeed-activation-checkpointing] 3: [--partition-activations] [--contigious-checkpointing] 3: [--checkpoint-in-cpu] [--synchronize-each-layer] 3: [--profile-backward] [--deepspeed] 3: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 3: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 3: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 3: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 3: [--hidden-size HIDDEN_SIZE] 3: [--ffn-hidden-size FFN_HIDDEN_SIZE] 3: [--num-attention-heads NUM_ATTENTION_HEADS] 3: [--kv-channels KV_CHANNELS] 3: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 3: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 3: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 3: [--layernorm-epsilon LAYERNORM_EPSILON] 3: [--sync-tp-duplicated-parameters] 3: [--apply-residual-connection-post-layernorm] 3: [--embed-layernorm] [--openai-gelu] 3: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 3: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 3: [--glu-activation {geglu,liglu,reglu,swiglu}] 3: [--kill-switch-path KILL_SWITCH_PATH] 3: [--log-level {debug,info,warning,error,critical}] 3: [--log-level-replica {debug,info,warning,error,critical}] 3: [--attention-dropout ATTENTION_DROPOUT] 3: [--hidden-dropout HIDDEN_DROPOUT] 3: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 3: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 3: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 3: [--micro-batch-size MICRO_BATCH_SIZE] 3: [--batch-size BATCH_SIZE] 3: [--global-batch-size GLOBAL_BATCH_SIZE] 3: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 3: [--checkpoint-activations] 3: [--distribute-checkpointed-activations] 3: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 3: [--train-iters TRAIN_ITERS] 3: [--train-samples TRAIN_SAMPLES] 3: [--train-tokens TRAIN_TOKENS] 3: [--log-interval LOG_INTERVAL] 3: [--exit-interval EXIT_INTERVAL] 3: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 3: [--tensorboard-dir TENSORBOARD_DIR] 3: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 3: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 3: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 3: [--use-bnb-optimizer] 3: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 3: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 3: [--eval-only EVAL_ONLY] 3: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 3: [--inference] 3: [--abort-on-unmet-fused-kernel-constraints] 3: [--pp-partition-method PP_PARTITION_METHOD] 3: [--seed SEED] [--init-method-std INIT_METHOD_STD] 3: [--init-method-xavier-uniform] [--lr LR] 3: [--lr-decay-style {constant,linear,cosine}] 3: [--lr-decay-iters LR_DECAY_ITERS] 3: [--lr-decay-samples LR_DECAY_SAMPLES] 3: [--lr-decay-tokens LR_DECAY_TOKENS] 3: [--lr-warmup-fraction LR_WARMUP_FRACTION] 3: [--lr-warmup-iters LR_WARMUP_ITERS] 3: [--lr-warmup-samples LR_WARMUP_SAMPLES] 3: [--warmup WARMUP] [--min-lr MIN_LR] 3: [--override-lr-scheduler] 3: [--use-checkpoint-lr-scheduler] 3: [--universal-checkpoint] [--save SAVE] 3: [--save-interval SAVE_INTERVAL] [--no-save-optim] 3: [--no-save-rng] [--load LOAD] [--no-load-optim] 3: [--no-load-rng] [--finetune] [--fp16] [--bf16] 3: [--loss-scale LOSS_SCALE] 3: [--initial-loss-scale INITIAL_LOSS_SCALE] 3: [--min-loss-scale MIN_LOSS_SCALE] 3: [--loss-scale-window LOSS_SCALE_WINDOW] 3: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 3: [--no-query-key-layer-scaling] 3: [--attention-softmax-in-fp32] 3: [--accumulate-allreduce-grads-in-fp32] 3: [--fp16-lm-cross-entropy] 3: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 3: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 3: [--model-parallel-size MODEL_PARALLEL_SIZE] 3: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 3: [--distributed-backend {nccl,gloo}] 3: [--DDP-impl {local,torch}] 3: [--use-contiguous-buffers-in-ddp] 3: [--no-scatter-gather-tensors-in-pipeline] 3: [--local_rank LOCAL_RANK] 3: [--lazy-mpu-init LAZY_MPU_INIT] 3: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 3: [--eval-interval EVAL_INTERVAL] 3: [--data-path [DATA_PATH ...]] [--split SPLIT] 3: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 3: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 3: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 3: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 3: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 3: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 3: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 3: [--merge-file MERGE_FILE] 3: [--vocab-extra-ids VOCAB_EXTRA_IDS] 3: [--seq-length SEQ_LENGTH] 3: [--encoder-seq-length ENCODER_SEQ_LENGTH] 3: [--decoder-seq-length DECODER_SEQ_LENGTH] 3: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 3: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 3: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 3: [--num-workers NUM_WORKERS] 3: [--valid-num-workers VALID_NUM_WORKERS] 3: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 3: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 3: [--data-impl {lazy,cached,mmap,infer}] 3: [--reset-position-ids] [--reset-attention-mask] 3: [--eod-mask-loss] [--loss-on-targets-only] 3: [--reweight-loss-based-on-position-frequency] 3: [--noise-density NOISE_DENSITY] 3: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 3: [--adlr-autoresume] 3: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 3: [--ict-head-size ICT_HEAD_SIZE] 3: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 3: [--biencoder-shared-query-context-model] 3: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 3: [--titles-data-path TITLES_DATA_PATH] 3: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 3: [--use-one-sent-docs] 3: [--evidence-data-path EVIDENCE_DATA_PATH] 3: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 3: [--retriever-score-scaling] 3: [--block-data-path BLOCK_DATA_PATH] 3: [--embedding-path EMBEDDING_PATH] 3: [--indexer-batch-size INDEXER_BATCH_SIZE] 3: [--indexer-log-interval INDEXER_LOG_INTERVAL] 3: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 3: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 3: [--log-params-norm] [--log-num-zeros-in-grad] 3: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 3: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 3: [--log-timers-to-tensorboard] 3: [--log-batch-size-to-tensorboard] 3: [--no-log-learnig-rate-to-tensorboard] 3: [--no-log-loss-scale-to-tensorboard] 3: [--log-validation-ppl-to-tensorboard] 3: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 3: [--zero-contigious-gradients] 3: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 3: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 3: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 3: [--scattered-embeddings] [--split-transformers] 3: [--memory-centric-tiled-linear] 3: [--tile-factor TILE_FACTOR] 3: [--deepspeed-activation-checkpointing] 3: [--partition-activations] [--contigious-checkpointing] 3: [--checkpoint-in-cpu] [--synchronize-each-layer] 3: [--profile-backward] [--deepspeed] 3: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 3: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 3: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 3: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 3: [--hidden-size HIDDEN_SIZE] 3: [--ffn-hidden-size FFN_HIDDEN_SIZE] 3: [--num-attention-heads NUM_ATTENTION_HEADS] 3: [--kv-channels KV_CHANNELS] 3: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 3: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 3: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 3: [--layernorm-epsilon LAYERNORM_EPSILON] 3: [--sync-tp-duplicated-parameters] 3: [--apply-residual-connection-post-layernorm] 3: [--embed-layernorm] [--openai-gelu] 3: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 3: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 3: [--glu-activation {geglu,liglu,reglu,swiglu}] 3: [--kill-switch-path KILL_SWITCH_PATH] 3: [--log-level {debug,info,warning,error,critical}] 3: [--log-level-replica {debug,info,warning,error,critical}] 3: [--attention-dropout ATTENTION_DROPOUT] 3: [--hidden-dropout HIDDEN_DROPOUT] 3: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 3: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 3: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 3: [--micro-batch-size MICRO_BATCH_SIZE] 3: [--batch-size BATCH_SIZE] 3: [--global-batch-size GLOBAL_BATCH_SIZE] 3: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 3: [--checkpoint-activations] 3: [--distribute-checkpointed-activations] 3: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 3: [--train-iters TRAIN_ITERS] 3: [--train-samples TRAIN_SAMPLES] 3: [--train-tokens TRAIN_TOKENS] 3: [--log-interval LOG_INTERVAL] 3: [--exit-interval EXIT_INTERVAL] 3: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 3: [--tensorboard-dir TENSORBOARD_DIR] 3: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 3: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 3: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 3: [--use-bnb-optimizer] 3: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 3: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 3: [--eval-only EVAL_ONLY] 3: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 3: [--inference] 3: [--abort-on-unmet-fused-kernel-constraints] 3: [--pp-partition-method PP_PARTITION_METHOD] 3: [--seed SEED] [--init-method-std INIT_METHOD_STD] 3: [--init-method-xavier-uniform] [--lr LR] 3: [--lr-decay-style {constant,linear,cosine}] 3: [--lr-decay-iters LR_DECAY_ITERS] 3: [--lr-decay-samples LR_DECAY_SAMPLES] 3: [--lr-decay-tokens LR_DECAY_TOKENS] 3: [--lr-warmup-fraction LR_WARMUP_FRACTION] 3: [--lr-warmup-iters LR_WARMUP_ITERS] 3: [--lr-warmup-samples LR_WARMUP_SAMPLES] 3: [--warmup WARMUP] [--min-lr MIN_LR] 3: [--override-lr-scheduler] 3: [--use-checkpoint-lr-scheduler] 3: [--universal-checkpoint] [--save SAVE] 3: [--save-interval SAVE_INTERVAL] [--no-save-optim] 3: [--no-save-rng] [--load LOAD] [--no-load-optim] 3: [--no-load-rng] [--finetune] [--fp16] [--bf16] 3: [--loss-scale LOSS_SCALE] 3: [--initial-loss-scale INITIAL_LOSS_SCALE] 3: [--min-loss-scale MIN_LOSS_SCALE] 3: [--loss-scale-window LOSS_SCALE_WINDOW] 3: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 3: [--no-query-key-layer-scaling] 3: [--attention-softmax-in-fp32] 3: [--accumulate-allreduce-grads-in-fp32] 3: [--fp16-lm-cross-entropy] 3: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 3: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 3: [--model-parallel-size MODEL_PARALLEL_SIZE] 3: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 3: [--distributed-backend {nccl,gloo}] 3: [--DDP-impl {local,torch}] 3: [--use-contiguous-buffers-in-ddp] 3: [--no-scatter-gather-tensors-in-pipeline] 3: [--local_rank LOCAL_RANK] 3: [--lazy-mpu-init LAZY_MPU_INIT] 3: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 3: [--eval-interval EVAL_INTERVAL] 3: [--data-path [DATA_PATH ...]] [--split SPLIT] 3: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 3: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 3: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 3: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 3: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 3: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 3: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 3: [--merge-file MERGE_FILE] 3: [--vocab-extra-ids VOCAB_EXTRA_IDS] 3: [--seq-length SEQ_LENGTH] 3: [--encoder-seq-length ENCODER_SEQ_LENGTH] 3: [--decoder-seq-length DECODER_SEQ_LENGTH] 3: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 3: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 3: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 3: [--num-workers NUM_WORKERS] 3: [--valid-num-workers VALID_NUM_WORKERS] 3: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 3: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 3: [--data-impl {lazy,cached,mmap,infer}] 3: [--reset-position-ids] [--reset-attention-mask] 3: [--eod-mask-loss] [--loss-on-targets-only] 3: [--reweight-loss-based-on-position-frequency] 3: [--noise-density NOISE_DENSITY] 3: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 3: [--adlr-autoresume] 3: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 3: [--ict-head-size ICT_HEAD_SIZE] 3: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 3: [--biencoder-shared-query-context-model] 3: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 3: [--titles-data-path TITLES_DATA_PATH] 3: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 3: [--use-one-sent-docs] 3: [--evidence-data-path EVIDENCE_DATA_PATH] 3: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 3: [--retriever-score-scaling] 3: [--block-data-path BLOCK_DATA_PATH] 3: [--embedding-path EMBEDDING_PATH] 3: [--indexer-batch-size INDEXER_BATCH_SIZE] 3: [--indexer-log-interval INDEXER_LOG_INTERVAL] 3: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 3: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 3: [--log-params-norm] [--log-num-zeros-in-grad] 3: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 3: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 3: [--log-timers-to-tensorboard] 3: [--log-batch-size-to-tensorboard] 3: [--no-log-learnig-rate-to-tensorboard] 3: [--no-log-loss-scale-to-tensorboard] 3: [--log-validation-ppl-to-tensorboard] 3: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 3: [--zero-contigious-gradients] 3: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 3: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 3: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 3: [--scattered-embeddings] [--split-transformers] 3: [--memory-centric-tiled-linear] 3: [--tile-factor TILE_FACTOR] 3: [--deepspeed-activation-checkpointing] 3: [--partition-activations] [--contigious-checkpointing] 3: [--checkpoint-in-cpu] [--synchronize-each-layer] 3: [--profile-backward] [--deepspeed] 3: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 3: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 3: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 3: [--hidden-size HIDDEN_SIZE] 3: [--ffn-hidden-size FFN_HIDDEN_SIZE] 3: [--num-attention-heads NUM_ATTENTION_HEADS] 3: [--kv-channels KV_CHANNELS] 3: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 3: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 3: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 3: [--layernorm-epsilon LAYERNORM_EPSILON] 3: [--sync-tp-duplicated-parameters] 3: [--apply-residual-connection-post-layernorm] 3: [--embed-layernorm] [--openai-gelu] 3: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 3: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 3: [--glu-activation {geglu,liglu,reglu,swiglu}] 3: [--kill-switch-path KILL_SWITCH_PATH] 3: [--log-level {debug,info,warning,error,critical}] 3: [--log-level-replica {debug,info,warning,error,critical}] 3: [--attention-dropout ATTENTION_DROPOUT] 3: [--hidden-dropout HIDDEN_DROPOUT] 3: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 3: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 3: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 3: [--micro-batch-size MICRO_BATCH_SIZE] 3: [--batch-size BATCH_SIZE] 3: [--global-batch-size GLOBAL_BATCH_SIZE] 3: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 3: [--checkpoint-activations] 3: [--distribute-checkpointed-activations] 3: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 3: [--train-iters TRAIN_ITERS] 3: [--train-samples TRAIN_SAMPLES] 3: [--train-tokens TRAIN_TOKENS] 3: [--log-interval LOG_INTERVAL] 3: [--exit-interval EXIT_INTERVAL] 3: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 3: [--tensorboard-dir TENSORBOARD_DIR] 3: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 3: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 3: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 3: [--use-bnb-optimizer] 3: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 3: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 3: [--eval-only EVAL_ONLY] 3: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 3: [--inference] 3: [--abort-on-unmet-fused-kernel-constraints] 3: [--pp-partition-method PP_PARTITION_METHOD] 3: [--seed SEED] [--init-method-std INIT_METHOD_STD] 3: [--init-method-xavier-uniform] [--lr LR] 3: [--lr-decay-style {constant,linear,cosine}] 3: [--lr-decay-iters LR_DECAY_ITERS] 3: [--lr-decay-samples LR_DECAY_SAMPLES] 3: [--lr-decay-tokens LR_DECAY_TOKENS] 3: [--lr-warmup-fraction LR_WARMUP_FRACTION] 3: [--lr-warmup-iters LR_WARMUP_ITERS] 3: [--lr-warmup-samples LR_WARMUP_SAMPLES] 3: [--warmup WARMUP] [--min-lr MIN_LR] 3: [--override-lr-scheduler] 3: [--use-checkpoint-lr-scheduler] 3: [--universal-checkpoint] [--save SAVE] 3: [--save-interval SAVE_INTERVAL] [--no-save-optim] 3: [--no-save-rng] [--load LOAD] [--no-load-optim] 3: [--no-load-rng] [--finetune] [--fp16] [--bf16] 3: [--loss-scale LOSS_SCALE] 3: [--initial-loss-scale INITIAL_LOSS_SCALE] 3: [--min-loss-scale MIN_LOSS_SCALE] 3: [--loss-scale-window LOSS_SCALE_WINDOW] 3: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 3: [--no-query-key-layer-scaling] 3: [--attention-softmax-in-fp32] 3: [--accumulate-allreduce-grads-in-fp32] 3: [--fp16-lm-cross-entropy] 3: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 3: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 3: [--model-parallel-size MODEL_PARALLEL_SIZE] 3: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 3: [--distributed-backend {nccl,gloo}] 3: [--DDP-impl {local,torch}] 3: [--use-contiguous-buffers-in-ddp] 3: [--no-scatter-gather-tensors-in-pipeline] 3: [--local_rank LOCAL_RANK] 3: [--lazy-mpu-init LAZY_MPU_INIT] 3: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 3: [--eval-interval EVAL_INTERVAL] 3: [--data-path [DATA_PATH ...]] [--split SPLIT] 3: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 3: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 3: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 3: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 3: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 3: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 3: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 3: [--merge-file MERGE_FILE] 3: [--vocab-extra-ids VOCAB_EXTRA_IDS] 3: [--seq-length SEQ_LENGTH] 3: [--encoder-seq-length ENCODER_SEQ_LENGTH] 3: [--decoder-seq-length DECODER_SEQ_LENGTH] 3: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 3: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 3: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 3: [--num-workers NUM_WORKERS] 3: [--valid-num-workers VALID_NUM_WORKERS] 3: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 3: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 3: [--data-impl {lazy,cached,mmap,infer}] 3: [--reset-position-ids] [--reset-attention-mask] 3: [--eod-mask-loss] [--loss-on-targets-only] 3: [--reweight-loss-based-on-position-frequency] 3: [--noise-density NOISE_DENSITY] 3: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 3: [--adlr-autoresume] 3: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 3: [--ict-head-size ICT_HEAD_SIZE] 3: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 3: [--biencoder-shared-query-context-model] 3: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 3: [--titles-data-path TITLES_DATA_PATH] 3: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 3: [--use-one-sent-docs] 3: [--evidence-data-path EVIDENCE_DATA_PATH] 3: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 3: [--retriever-score-scaling] 3: [--block-data-path BLOCK_DATA_PATH] 3: [--embedding-path EMBEDDING_PATH] 3: [--indexer-batch-size INDEXER_BATCH_SIZE] 3: [--indexer-log-interval INDEXER_LOG_INTERVAL] 3: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 3: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 3: [--log-params-norm] [--log-num-zeros-in-grad] 3: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 3: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 3: [--log-timers-to-tensorboard] 3: [--log-batch-size-to-tensorboard] 3: [--no-log-learnig-rate-to-tensorboard] 3: [--no-log-loss-scale-to-tensorboard] 3: [--log-validation-ppl-to-tensorboard] 3: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 3: [--zero-contigious-gradients] 3: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 3: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 3: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 3: [--scattered-embeddings] [--split-transformers] 3: [--memory-centric-tiled-linear] 3: [--tile-factor TILE_FACTOR] 3: [--deepspeed-activation-checkpointing] 3: [--partition-activations] [--contigious-checkpointing] 3: [--checkpoint-in-cpu] [--synchronize-each-layer] 3: [--profile-backward] [--deepspeed] 3: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 3: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 3: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 3: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 3: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 3: [--hidden-size HIDDEN_SIZE] 3: [--ffn-hidden-size FFN_HIDDEN_SIZE] 3: [--num-attention-heads NUM_ATTENTION_HEADS] 3: [--kv-channels KV_CHANNELS] 3: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 3: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 3: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 3: [--layernorm-epsilon LAYERNORM_EPSILON] 3: [--sync-tp-duplicated-parameters] 3: [--apply-residual-connection-post-layernorm] 3: [--embed-layernorm] [--openai-gelu] 3: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 3: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 3: [--glu-activation {geglu,liglu,reglu,swiglu}] 3: [--kill-switch-path KILL_SWITCH_PATH] 3: [--log-level {debug,info,warning,error,critical}] 3: [--log-level-replica {debug,info,warning,error,critical}] 3: [--attention-dropout ATTENTION_DROPOUT] 3: [--hidden-dropout HIDDEN_DROPOUT] 3: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 3: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 3: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 3: [--micro-batch-size MICRO_BATCH_SIZE] 3: [--batch-size BATCH_SIZE] 3: [--global-batch-size GLOBAL_BATCH_SIZE] 3: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 3: [--checkpoint-activations] 3: [--distribute-checkpointed-activations] 3: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 3: [--train-iters TRAIN_ITERS] 3: [--train-samples TRAIN_SAMPLES] 3: [--train-tokens TRAIN_TOKENS] 3: [--log-interval LOG_INTERVAL] 3: [--exit-interval EXIT_INTERVAL] 3: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 3: [--tensorboard-dir TENSORBOARD_DIR] 3: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 3: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 3: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 3: [--use-bnb-optimizer] 3: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 3: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 3: [--eval-only EVAL_ONLY] 3: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 3: [--inference] 3: [--abort-on-unmet-fused-kernel-constraints] 3: [--pp-partition-method PP_PARTITION_METHOD] 3: [--seed SEED] [--init-method-std INIT_METHOD_STD] 3: [--init-method-xavier-uniform] [--lr LR] 3: [--lr-decay-style {constant,linear,cosine}] 3: [--lr-decay-iters LR_DECAY_ITERS] 3: [--lr-decay-samples LR_DECAY_SAMPLES] 3: [--lr-decay-tokens LR_DECAY_TOKENS] 3: [--lr-warmup-fraction LR_WARMUP_FRACTION] 3: [--lr-warmup-iters LR_WARMUP_ITERS] 3: [--lr-warmup-samples LR_WARMUP_SAMPLES] 3: [--warmup WARMUP] [--min-lr MIN_LR] 3: [--override-lr-scheduler] 3: [--use-checkpoint-lr-scheduler] 3: [--universal-checkpoint] [--save SAVE] 3: [--save-interval SAVE_INTERVAL] [--no-save-optim] 3: [--no-save-rng] [--load LOAD] [--no-load-optim] 3: [--no-load-rng] [--finetune] [--fp16] [--bf16] 3: [--loss-scale LOSS_SCALE] 3: [--initial-loss-scale INITIAL_LOSS_SCALE] 3: [--min-loss-scale MIN_LOSS_SCALE] 3: [--loss-scale-window LOSS_SCALE_WINDOW] 3: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 3: [--no-query-key-layer-scaling] 3: [--attention-softmax-in-fp32] 3: [--accumulate-allreduce-grads-in-fp32] 3: [--fp16-lm-cross-entropy] 3: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 3: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 3: [--model-parallel-size MODEL_PARALLEL_SIZE] 3: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 3: [--distributed-backend {nccl,gloo}] 3: [--DDP-impl {local,torch}] 3: [--use-contiguous-buffers-in-ddp] 3: [--no-scatter-gather-tensors-in-pipeline] 3: [--local_rank LOCAL_RANK] 3: [--lazy-mpu-init LAZY_MPU_INIT] 3: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 3: [--eval-interval EVAL_INTERVAL] 3: [--data-path [DATA_PATH ...]] [--split SPLIT] 3: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 3: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 3: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 3: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 3: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 3: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 3: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 3: [--merge-file MERGE_FILE] 3: [--vocab-extra-ids VOCAB_EXTRA_IDS] 3: [--seq-length SEQ_LENGTH] 3: [--encoder-seq-length ENCODER_SEQ_LENGTH] 3: [--decoder-seq-length DECODER_SEQ_LENGTH] 3: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 3: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 3: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 3: [--num-workers NUM_WORKERS] 3: [--valid-num-workers VALID_NUM_WORKERS] 3: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 3: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 3: [--data-impl {lazy,cached,mmap,infer}] 3: [--reset-position-ids] [--reset-attention-mask] 3: [--eod-mask-loss] [--loss-on-targets-only] 3: [--reweight-loss-based-on-position-frequency] 3: [--noise-density NOISE_DENSITY] 3: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 3: [--adlr-autoresume] 3: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 3: [--ict-head-size ICT_HEAD_SIZE] 3: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 3: [--biencoder-shared-query-context-model] 3: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 3: [--titles-data-path TITLES_DATA_PATH] 3: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 3: [--use-one-sent-docs] 3: [--evidence-data-path EVIDENCE_DATA_PATH] 3: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 3: [--retriever-score-scaling] 3: [--block-data-path BLOCK_DATA_PATH] 3: [--embedding-path EMBEDDING_PATH] 3: [--indexer-batch-size INDEXER_BATCH_SIZE] 3: [--indexer-log-interval INDEXER_LOG_INTERVAL] 3: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 3: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 3: [--log-params-norm] [--log-num-zeros-in-grad] 3: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 3: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 3: [--log-timers-to-tensorboard] 3: [--log-batch-size-to-tensorboard] 3: [--no-log-learnig-rate-to-tensorboard] 3: [--no-log-loss-scale-to-tensorboard] 3: [--log-validation-ppl-to-tensorboard] 3: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 3: [--zero-contigious-gradients] 3: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 3: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 3: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 3: [--scattered-embeddings] [--split-transformers] 3: [--memory-centric-tiled-linear] 3: [--tile-factor TILE_FACTOR] 3: [--deepspeed-activation-checkpointing] 3: [--partition-activations] [--contigious-checkpointing] 3: [--checkpoint-in-cpu] [--synchronize-each-layer] 3: [--profile-backward] [--deepspeed] 3: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 3: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 3: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 3: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 3: [--hidden-size HIDDEN_SIZE] 3: [--ffn-hidden-size FFN_HIDDEN_SIZE] 3: [--num-attention-heads NUM_ATTENTION_HEADS] 3: [--kv-channels KV_CHANNELS] 3: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 3: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 3: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 3: [--layernorm-epsilon LAYERNORM_EPSILON] 3: [--sync-tp-duplicated-parameters] 3: [--apply-residual-connection-post-layernorm] 3: [--embed-layernorm] [--openai-gelu] 3: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 3: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 3: [--glu-activation {geglu,liglu,reglu,swiglu}] 3: [--kill-switch-path KILL_SWITCH_PATH] 3: [--log-level {debug,info,warning,error,critical}] 3: [--log-level-replica {debug,info,warning,error,critical}] 3: [--attention-dropout ATTENTION_DROPOUT] 3: [--hidden-dropout HIDDEN_DROPOUT] 3: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 3: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 3: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 3: [--micro-batch-size MICRO_BATCH_SIZE] 3: [--batch-size BATCH_SIZE] 3: [--global-batch-size GLOBAL_BATCH_SIZE] 3: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 3: [--checkpoint-activations] 3: [--distribute-checkpointed-activations] 3: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 3: [--train-iters TRAIN_ITERS] 3: [--train-samples TRAIN_SAMPLES] 3: [--train-tokens TRAIN_TOKENS] 3: [--log-interval LOG_INTERVAL] 3: [--exit-interval EXIT_INTERVAL] 3: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 3: [--tensorboard-dir TENSORBOARD_DIR] 3: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 3: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 3: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 3: [--use-bnb-optimizer] 3: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 3: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 3: [--eval-only EVAL_ONLY] 3: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 3: [--inference] 3: [--abort-on-unmet-fused-kernel-constraints] 3: [--pp-partition-method PP_PARTITION_METHOD] 3: [--seed SEED] [--init-method-std INIT_METHOD_STD] 3: [--init-method-xavier-uniform] [--lr LR] 3: [--lr-decay-style {constant,linear,cosine}] 3: [--lr-decay-iters LR_DECAY_ITERS] 3: [--lr-decay-samples LR_DECAY_SAMPLES] 3: [--lr-decay-tokens LR_DECAY_TOKENS] 3: [--lr-warmup-fraction LR_WARMUP_FRACTION] 3: [--lr-warmup-iters LR_WARMUP_ITERS] 3: [--lr-warmup-samples LR_WARMUP_SAMPLES] 3: [--warmup WARMUP] [--min-lr MIN_LR] 3: [--override-lr-scheduler] 3: [--use-checkpoint-lr-scheduler] 3: [--universal-checkpoint] [--save SAVE] 3: [--save-interval SAVE_INTERVAL] [--no-save-optim] 3: [--no-save-rng] [--load LOAD] [--no-load-optim] 3: [--no-load-rng] [--finetune] [--fp16] [--bf16] 3: [--loss-scale LOSS_SCALE] 3: [--initial-loss-scale INITIAL_LOSS_SCALE] 3: [--min-loss-scale MIN_LOSS_SCALE] 3: [--loss-scale-window LOSS_SCALE_WINDOW] 3: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 3: [--no-query-key-layer-scaling] 3: [--attention-softmax-in-fp32] 3: [--accumulate-allreduce-grads-in-fp32] 3: [--fp16-lm-cross-entropy] 3: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 3: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 3: [--model-parallel-size MODEL_PARALLEL_SIZE] 3: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 3: [--distributed-backend {nccl,gloo}] 3: [--DDP-impl {local,torch}] 3: [--use-contiguous-buffers-in-ddp] 3: [--no-scatter-gather-tensors-in-pipeline] 3: [--local_rank LOCAL_RANK] 3: [--lazy-mpu-init LAZY_MPU_INIT] 3: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 3: [--eval-interval EVAL_INTERVAL] 3: [--data-path [DATA_PATH ...]] [--split SPLIT] 3: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 3: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 3: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 3: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 3: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 3: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 3: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 3: [--merge-file MERGE_FILE] 3: [--vocab-extra-ids VOCAB_EXTRA_IDS] 3: [--seq-length SEQ_LENGTH] 3: [--encoder-seq-length ENCODER_SEQ_LENGTH] 3: [--decoder-seq-length DECODER_SEQ_LENGTH] 3: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 3: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 3: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 3: [--num-workers NUM_WORKERS] 3: [--valid-num-workers VALID_NUM_WORKERS] 3: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 3: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 3: [--data-impl {lazy,cached,mmap,infer}] 3: [--reset-position-ids] [--reset-attention-mask] 3: [--eod-mask-loss] [--loss-on-targets-only] 3: [--reweight-loss-based-on-position-frequency] 3: [--noise-density NOISE_DENSITY] 3: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 3: [--adlr-autoresume] 3: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 3: [--ict-head-size ICT_HEAD_SIZE] 3: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 3: [--biencoder-shared-query-context-model] 3: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 3: [--titles-data-path TITLES_DATA_PATH] 3: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 3: [--use-one-sent-docs] 3: [--evidence-data-path EVIDENCE_DATA_PATH] 3: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 3: [--retriever-score-scaling] 3: [--block-data-path BLOCK_DATA_PATH] 3: [--embedding-path EMBEDDING_PATH] 3: [--indexer-batch-size INDEXER_BATCH_SIZE] 3: [--indexer-log-interval INDEXER_LOG_INTERVAL] 3: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 3: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 3: [--log-params-norm] [--log-num-zeros-in-grad] 3: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 3: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 3: [--log-timers-to-tensorboard] 3: [--log-batch-size-to-tensorboard] 3: [--no-log-learnig-rate-to-tensorboard] 3: [--no-log-loss-scale-to-tensorboard] 3: [--log-validation-ppl-to-tensorboard] 3: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 3: [--zero-contigious-gradients] 3: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 3: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 3: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 3: [--scattered-embeddings] [--split-transformers] 3: [--memory-centric-tiled-linear] 3: [--tile-factor TILE_FACTOR] 3: [--deepspeed-activation-checkpointing] 3: [--partition-activations] [--contigious-checkpointing] 3: [--checkpoint-in-cpu] [--synchronize-each-layer] 3: [--profile-backward] [--deepspeed] 3: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 3: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 3: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 3: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 3: [--hidden-size HIDDEN_SIZE] 3: [--ffn-hidden-size FFN_HIDDEN_SIZE] 3: [--num-attention-heads NUM_ATTENTION_HEADS] 3: [--kv-channels KV_CHANNELS] 3: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 3: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 3: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 3: [--layernorm-epsilon LAYERNORM_EPSILON] 3: [--sync-tp-duplicated-parameters] 3: [--apply-residual-connection-post-layernorm] 3: [--embed-layernorm] [--openai-gelu] 3: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 3: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 3: [--glu-activation {geglu,liglu,reglu,swiglu}] 3: [--kill-switch-path KILL_SWITCH_PATH] 3: [--log-level {debug,info,warning,error,critical}] 3: [--log-level-replica {debug,info,warning,error,critical}] 3: [--attention-dropout ATTENTION_DROPOUT] 3: [--hidden-dropout HIDDEN_DROPOUT] 3: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 3: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 3: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 3: [--micro-batch-size MICRO_BATCH_SIZE] 3: [--batch-size BATCH_SIZE] 3: [--global-batch-size GLOBAL_BATCH_SIZE] 3: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 3: [--checkpoint-activations] 3: [--distribute-checkpointed-activations] 3: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 3: [--train-iters TRAIN_ITERS] 3: [--train-samples TRAIN_SAMPLES] 3: [--train-tokens TRAIN_TOKENS] 3: [--log-interval LOG_INTERVAL] 3: [--exit-interval EXIT_INTERVAL] 3: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 3: [--tensorboard-dir TENSORBOARD_DIR] 3: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 3: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 3: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 3: [--use-bnb-optimizer] 3: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 3: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 3: [--eval-only EVAL_ONLY] 3: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 3: [--inference] 3: [--abort-on-unmet-fused-kernel-constraints] 3: [--pp-partition-method PP_PARTITION_METHOD] 3: [--seed SEED] [--init-method-std INIT_METHOD_STD] 3: [--init-method-xavier-uniform] [--lr LR] 3: [--lr-decay-style {constant,linear,cosine}] 3: [--lr-decay-iters LR_DECAY_ITERS] 3: [--lr-decay-samples LR_DECAY_SAMPLES] 3: [--lr-decay-tokens LR_DECAY_TOKENS] 3: [--lr-warmup-fraction LR_WARMUP_FRACTION] 3: [--lr-warmup-iters LR_WARMUP_ITERS] 3: [--lr-warmup-samples LR_WARMUP_SAMPLES] 3: [--warmup WARMUP] [--min-lr MIN_LR] 3: [--override-lr-scheduler] 3: [--use-checkpoint-lr-scheduler] 3: [--universal-checkpoint] [--save SAVE] 3: [--save-interval SAVE_INTERVAL] [--no-save-optim] 3: [--no-save-rng] [--load LOAD] [--no-load-optim] 3: [--no-load-rng] [--finetune] [--fp16] [--bf16] 3: [--loss-scale LOSS_SCALE] 3: [--initial-loss-scale INITIAL_LOSS_SCALE] 3: [--min-loss-scale MIN_LOSS_SCALE] 3: [--loss-scale-window LOSS_SCALE_WINDOW] 3: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 3: [--no-query-key-layer-scaling] 3: [--attention-softmax-in-fp32] 3: [--accumulate-allreduce-grads-in-fp32] 3: [--fp16-lm-cross-entropy] 3: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 3: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 3: [--model-parallel-size MODEL_PARALLEL_SIZE] 3: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 3: [--distributed-backend {nccl,gloo}] 3: [--DDP-impl {local,torch}] 3: [--use-contiguous-buffers-in-ddp] 3: [--no-scatter-gather-tensors-in-pipeline] 3: [--local_rank LOCAL_RANK] 3: [--lazy-mpu-init LAZY_MPU_INIT] 3: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 3: [--eval-interval EVAL_INTERVAL] 3: [--data-path [DATA_PATH ...]] [--split SPLIT] 3: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 3: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 3: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 3: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 3: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 3: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 3: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 3: [--merge-file MERGE_FILE] 3: [--vocab-extra-ids VOCAB_EXTRA_IDS] 3: [--seq-length SEQ_LENGTH] 3: [--encoder-seq-length ENCODER_SEQ_LENGTH] 3: [--decoder-seq-length DECODER_SEQ_LENGTH] 3: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 3: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 3: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 3: [--num-workers NUM_WORKERS] 3: [--valid-num-workers VALID_NUM_WORKERS] 3: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 3: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 3: [--data-impl {lazy,cached,mmap,infer}] 3: [--reset-position-ids] [--reset-attention-mask] 3: [--eod-mask-loss] [--loss-on-targets-only] 3: [--reweight-loss-based-on-position-frequency] 3: [--noise-density NOISE_DENSITY] 3: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 3: [--adlr-autoresume] 3: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 3: [--ict-head-size ICT_HEAD_SIZE] 3: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 3: [--biencoder-shared-query-context-model] 3: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 3: [--titles-data-path TITLES_DATA_PATH] 3: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 3: [--use-one-sent-docs] 3: [--evidence-data-path EVIDENCE_DATA_PATH] 3: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 3: [--retriever-score-scaling] 3: [--block-data-path BLOCK_DATA_PATH] 3: [--embedding-path EMBEDDING_PATH] 3: [--indexer-batch-size INDEXER_BATCH_SIZE] 3: [--indexer-log-interval INDEXER_LOG_INTERVAL] 3: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 3: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 3: [--log-params-norm] [--log-num-zeros-in-grad] 3: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 3: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 3: [--log-timers-to-tensorboard] 3: [--log-batch-size-to-tensorboard] 3: [--no-log-learnig-rate-to-tensorboard] 3: [--no-log-loss-scale-to-tensorboard] 3: [--log-validation-ppl-to-tensorboard] 3: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 3: [--zero-contigious-gradients] 3: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 3: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 3: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 3: [--scattered-embeddings] [--split-transformers] 3: [--memory-centric-tiled-linear] 3: [--tile-factor TILE_FACTOR] 3: [--deepspeed-activation-checkpointing] 3: [--partition-activations] [--contigious-checkpointing] 3: [--checkpoint-in-cpu] [--synchronize-each-layer] 3: [--profile-backward] [--deepspeed] 3: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 3: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 3: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 3: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 3: [--hidden-size HIDDEN_SIZE] 3: [--ffn-hidden-size FFN_HIDDEN_SIZE] 3: [--num-attention-heads NUM_ATTENTION_HEADS] 3: [--kv-channels KV_CHANNELS] 3: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 3: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 3: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 3: [--layernorm-epsilon LAYERNORM_EPSILON] 3: [--sync-tp-duplicated-parameters] 3: [--apply-residual-connection-post-layernorm] 3: [--embed-layernorm] [--openai-gelu] 3: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 3: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 3: [--glu-activation {geglu,liglu,reglu,swiglu}] 3: [--kill-switch-path KILL_SWITCH_PATH] 3: [--log-level {debug,info,warning,error,critical}] 3: [--log-level-replica {debug,info,warning,error,critical}] 3: [--attention-dropout ATTENTION_DROPOUT] 3: [--hidden-dropout HIDDEN_DROPOUT] 3: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 3: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 3: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 3: [--micro-batch-size MICRO_BATCH_SIZE] 3: [--batch-size BATCH_SIZE] 3: [--global-batch-size GLOBAL_BATCH_SIZE] 3: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 3: [--checkpoint-activations] 3: [--distribute-checkpointed-activations] 3: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 3: [--train-iters TRAIN_ITERS] 3: [--train-samples TRAIN_SAMPLES] 3: [--train-tokens TRAIN_TOKENS] 3: [--log-interval LOG_INTERVAL] 3: [--exit-interval EXIT_INTERVAL] 3: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 3: [--tensorboard-dir TENSORBOARD_DIR] 3: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 3: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 3: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 3: [--use-bnb-optimizer] 3: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 3: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 3: [--eval-only EVAL_ONLY] 3: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 3: [--inference] 3: [--abort-on-unmet-fused-kernel-constraints] 3: [--pp-partition-method PP_PARTITION_METHOD] 3: [--seed SEED] [--init-method-std INIT_METHOD_STD] 3: [--init-method-xavier-uniform] [--lr LR] 3: [--lr-decay-style {constant,linear,cosine}] 3: [--lr-decay-iters LR_DECAY_ITERS] 3: [--lr-decay-samples LR_DECAY_SAMPLES] 3: [--lr-decay-tokens LR_DECAY_TOKENS] 3: [--lr-warmup-fraction LR_WARMUP_FRACTION] 3: [--lr-warmup-iters LR_WARMUP_ITERS] 3: [--lr-warmup-samples LR_WARMUP_SAMPLES] 3: [--warmup WARMUP] [--min-lr MIN_LR] 3: [--override-lr-scheduler] 3: [--use-checkpoint-lr-scheduler] 3: [--universal-checkpoint] [--save SAVE] 3: [--save-interval SAVE_INTERVAL] [--no-save-optim] 3: [--no-save-rng] [--load LOAD] [--no-load-optim] 3: [--no-load-rng] [--finetune] [--fp16] [--bf16] 3: [--loss-scale LOSS_SCALE] 3: [--initial-loss-scale INITIAL_LOSS_SCALE] 3: [--min-loss-scale MIN_LOSS_SCALE] 3: [--loss-scale-window LOSS_SCALE_WINDOW] 3: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 3: [--no-query-key-layer-scaling] 3: [--attention-softmax-in-fp32] 3: [--accumulate-allreduce-grads-in-fp32] 3: [--fp16-lm-cross-entropy] 3: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 3: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 3: [--model-parallel-size MODEL_PARALLEL_SIZE] 3: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 3: [--distributed-backend {nccl,gloo}] 3: [--DDP-impl {local,torch}] 3: [--use-contiguous-buffers-in-ddp] 3: [--no-scatter-gather-tensors-in-pipeline] 3: [--local_rank LOCAL_RANK] 3: [--lazy-mpu-init LAZY_MPU_INIT] 3: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 3: [--eval-interval EVAL_INTERVAL] 3: [--data-path [DATA_PATH ...]] [--split SPLIT] 3: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 3: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 3: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 3: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 3: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 3: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 3: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 3: [--merge-file MERGE_FILE] 3: [--vocab-extra-ids VOCAB_EXTRA_IDS] 3: [--seq-length SEQ_LENGTH] 3: [--encoder-seq-length ENCODER_SEQ_LENGTH] 3: [--decoder-seq-length DECODER_SEQ_LENGTH] 3: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 3: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 3: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 3: [--num-workers NUM_WORKERS] 3: [--valid-num-workers VALID_NUM_WORKERS] 3: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 3: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 3: [--data-impl {lazy,cached,mmap,infer}] 3: [--reset-position-ids] [--reset-attention-mask] 3: [--eod-mask-loss] [--loss-on-targets-only] 3: [--reweight-loss-based-on-position-frequency] 3: [--noise-density NOISE_DENSITY] 3: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 3: [--adlr-autoresume] 3: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 3: [--ict-head-size ICT_HEAD_SIZE] 3: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 3: [--biencoder-shared-query-context-model] 3: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 3: [--titles-data-path TITLES_DATA_PATH] 3: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 3: [--use-one-sent-docs] 3: [--evidence-data-path EVIDENCE_DATA_PATH] 3: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 3: [--retriever-score-scaling] 3: [--block-data-path BLOCK_DATA_PATH] 3: [--embedding-path EMBEDDING_PATH] 3: [--indexer-batch-size INDEXER_BATCH_SIZE] 3: [--indexer-log-interval INDEXER_LOG_INTERVAL] 3: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 3: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 3: [--log-params-norm] [--log-num-zeros-in-grad] 3: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 3: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 3: [--log-timers-to-tensorboard] 3: [--log-batch-size-to-tensorboard] 3: [--no-log-learnig-rate-to-tensorboard] 3: [--no-log-loss-scale-to-tensorboard] 3: [--log-validation-ppl-to-tensorboard] 3: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 3: [--zero-contigious-gradients] 3: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 3: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 3: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 3: [--scattered-embeddings] [--split-transformers] 3: [--memory-centric-tiled-linear] 3: [--tile-factor TILE_FACTOR] 3: [--deepspeed-activation-checkpointing] 3: [--partition-activations] [--contigious-checkpointing] 3: [--checkpoint-in-cpu] [--synchronize-each-layer] 3: [--profile-backward] [--deepspeed] 3: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 3: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 3: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 4: [--hidden-size HIDDEN_SIZE] 4: [--ffn-hidden-size FFN_HIDDEN_SIZE] 4: [--num-attention-heads NUM_ATTENTION_HEADS] 4: [--kv-channels KV_CHANNELS] 4: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 4: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 4: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 4: [--layernorm-epsilon LAYERNORM_EPSILON] 4: [--sync-tp-duplicated-parameters] 4: [--apply-residual-connection-post-layernorm] 4: [--embed-layernorm] [--openai-gelu] 4: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 4: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 4: [--glu-activation {geglu,liglu,reglu,swiglu}] 4: [--kill-switch-path KILL_SWITCH_PATH] 4: [--log-level {debug,info,warning,error,critical}] 4: [--log-level-replica {debug,info,warning,error,critical}] 4: [--attention-dropout ATTENTION_DROPOUT] 4: [--hidden-dropout HIDDEN_DROPOUT] 4: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 4: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 4: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 4: [--micro-batch-size MICRO_BATCH_SIZE] 4: [--batch-size BATCH_SIZE] 4: [--global-batch-size GLOBAL_BATCH_SIZE] 4: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 4: [--checkpoint-activations] 4: [--distribute-checkpointed-activations] 4: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 4: [--train-iters TRAIN_ITERS] 4: [--train-samples TRAIN_SAMPLES] 4: [--train-tokens TRAIN_TOKENS] 4: [--log-interval LOG_INTERVAL] 4: [--exit-interval EXIT_INTERVAL] 4: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 4: [--tensorboard-dir TENSORBOARD_DIR] 4: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 4: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 4: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 4: [--use-bnb-optimizer] 4: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 4: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 4: [--eval-only EVAL_ONLY] 4: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 4: [--inference] 4: [--abort-on-unmet-fused-kernel-constraints] 4: [--pp-partition-method PP_PARTITION_METHOD] 4: [--seed SEED] [--init-method-std INIT_METHOD_STD] 4: [--init-method-xavier-uniform] [--lr LR] 4: [--lr-decay-style {constant,linear,cosine}] 4: [--lr-decay-iters LR_DECAY_ITERS] 4: [--lr-decay-samples LR_DECAY_SAMPLES] 4: [--lr-decay-tokens LR_DECAY_TOKENS] 4: [--lr-warmup-fraction LR_WARMUP_FRACTION] 4: [--lr-warmup-iters LR_WARMUP_ITERS] 4: [--lr-warmup-samples LR_WARMUP_SAMPLES] 4: [--warmup WARMUP] [--min-lr MIN_LR] 4: [--override-lr-scheduler] 4: [--use-checkpoint-lr-scheduler] 4: [--universal-checkpoint] [--save SAVE] 4: [--save-interval SAVE_INTERVAL] [--no-save-optim] 4: [--no-save-rng] [--load LOAD] [--no-load-optim] 4: [--no-load-rng] [--finetune] [--fp16] [--bf16] 4: [--loss-scale LOSS_SCALE] 4: [--initial-loss-scale INITIAL_LOSS_SCALE] 4: [--min-loss-scale MIN_LOSS_SCALE] 4: [--loss-scale-window LOSS_SCALE_WINDOW] 4: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 4: [--no-query-key-layer-scaling] 4: [--attention-softmax-in-fp32] 4: [--accumulate-allreduce-grads-in-fp32] 4: [--fp16-lm-cross-entropy] 4: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 4: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 4: [--model-parallel-size MODEL_PARALLEL_SIZE] 4: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 4: [--distributed-backend {nccl,gloo}] 4: [--DDP-impl {local,torch}] 4: [--use-contiguous-buffers-in-ddp] 4: [--no-scatter-gather-tensors-in-pipeline] 4: [--local_rank LOCAL_RANK] 4: [--lazy-mpu-init LAZY_MPU_INIT] 4: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 4: [--eval-interval EVAL_INTERVAL] 4: [--data-path [DATA_PATH ...]] [--split SPLIT] 4: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 4: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 4: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 4: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 4: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 4: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 4: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 4: [--merge-file MERGE_FILE] 4: [--vocab-extra-ids VOCAB_EXTRA_IDS] 4: [--seq-length SEQ_LENGTH] 4: [--encoder-seq-length ENCODER_SEQ_LENGTH] 4: [--decoder-seq-length DECODER_SEQ_LENGTH] 4: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 4: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 4: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 4: [--num-workers NUM_WORKERS] 4: [--valid-num-workers VALID_NUM_WORKERS] 4: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 4: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 4: [--data-impl {lazy,cached,mmap,infer}] 4: [--reset-position-ids] [--reset-attention-mask] 4: [--eod-mask-loss] [--loss-on-targets-only] 4: [--reweight-loss-based-on-position-frequency] 4: [--noise-density NOISE_DENSITY] 4: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 4: [--adlr-autoresume] 4: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 4: [--ict-head-size ICT_HEAD_SIZE] 4: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 4: [--biencoder-shared-query-context-model] 4: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 4: [--titles-data-path TITLES_DATA_PATH] 4: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 4: [--use-one-sent-docs] 4: [--evidence-data-path EVIDENCE_DATA_PATH] 4: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 4: [--retriever-score-scaling] 4: [--block-data-path BLOCK_DATA_PATH] 4: [--embedding-path EMBEDDING_PATH] 4: [--indexer-batch-size INDEXER_BATCH_SIZE] 4: [--indexer-log-interval INDEXER_LOG_INTERVAL] 4: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 4: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 4: [--log-params-norm] [--log-num-zeros-in-grad] 4: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 4: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 4: [--log-timers-to-tensorboard] 4: [--log-batch-size-to-tensorboard] 4: [--no-log-learnig-rate-to-tensorboard] 4: [--no-log-loss-scale-to-tensorboard] 4: [--log-validation-ppl-to-tensorboard] 4: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 4: [--zero-contigious-gradients] 4: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 4: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 4: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 4: [--scattered-embeddings] [--split-transformers] 4: [--memory-centric-tiled-linear] 4: [--tile-factor TILE_FACTOR] 4: [--deepspeed-activation-checkpointing] 4: [--partition-activations] [--contigious-checkpointing] 4: [--checkpoint-in-cpu] [--synchronize-each-layer] 4: [--profile-backward] [--deepspeed] 4: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 4: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 4: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 4: [--hidden-size HIDDEN_SIZE] 4: [--ffn-hidden-size FFN_HIDDEN_SIZE] 4: [--num-attention-heads NUM_ATTENTION_HEADS] 4: [--kv-channels KV_CHANNELS] 4: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 4: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 4: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 4: [--layernorm-epsilon LAYERNORM_EPSILON] 4: [--sync-tp-duplicated-parameters] 4: [--apply-residual-connection-post-layernorm] 4: [--embed-layernorm] [--openai-gelu] 4: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 4: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 4: [--glu-activation {geglu,liglu,reglu,swiglu}] 4: [--kill-switch-path KILL_SWITCH_PATH] 4: [--log-level {debug,info,warning,error,critical}] 4: [--log-level-replica {debug,info,warning,error,critical}] 4: [--attention-dropout ATTENTION_DROPOUT] 4: [--hidden-dropout HIDDEN_DROPOUT] 4: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 4: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 4: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 4: [--micro-batch-size MICRO_BATCH_SIZE] 4: [--batch-size BATCH_SIZE] 4: [--global-batch-size GLOBAL_BATCH_SIZE] 4: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 4: [--checkpoint-activations] 4: [--distribute-checkpointed-activations] 4: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 4: [--train-iters TRAIN_ITERS] 4: [--train-samples TRAIN_SAMPLES] 4: [--train-tokens TRAIN_TOKENS] 4: [--log-interval LOG_INTERVAL] 4: [--exit-interval EXIT_INTERVAL] 4: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 4: [--tensorboard-dir TENSORBOARD_DIR] 4: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 4: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 4: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 4: [--use-bnb-optimizer] 4: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 4: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 4: [--eval-only EVAL_ONLY] 4: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 4: [--inference] 4: [--abort-on-unmet-fused-kernel-constraints] 4: [--pp-partition-method PP_PARTITION_METHOD] 4: [--seed SEED] [--init-method-std INIT_METHOD_STD] 4: [--init-method-xavier-uniform] [--lr LR] 4: [--lr-decay-style {constant,linear,cosine}] 4: [--lr-decay-iters LR_DECAY_ITERS] 4: [--lr-decay-samples LR_DECAY_SAMPLES] 4: [--lr-decay-tokens LR_DECAY_TOKENS] 4: [--lr-warmup-fraction LR_WARMUP_FRACTION] 4: [--lr-warmup-iters LR_WARMUP_ITERS] 4: [--lr-warmup-samples LR_WARMUP_SAMPLES] 4: [--warmup WARMUP] [--min-lr MIN_LR] 4: [--override-lr-scheduler] 4: [--use-checkpoint-lr-scheduler] 4: [--universal-checkpoint] [--save SAVE] 4: [--save-interval SAVE_INTERVAL] [--no-save-optim] 4: [--no-save-rng] [--load LOAD] [--no-load-optim] 4: [--no-load-rng] [--finetune] [--fp16] [--bf16] 4: [--loss-scale LOSS_SCALE] 4: [--initial-loss-scale INITIAL_LOSS_SCALE] 4: [--min-loss-scale MIN_LOSS_SCALE] 4: [--loss-scale-window LOSS_SCALE_WINDOW] 4: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 4: [--no-query-key-layer-scaling] 4: [--attention-softmax-in-fp32] 4: [--accumulate-allreduce-grads-in-fp32] 4: [--fp16-lm-cross-entropy] 4: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 4: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 4: [--model-parallel-size MODEL_PARALLEL_SIZE] 4: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 4: [--distributed-backend {nccl,gloo}] 4: [--DDP-impl {local,torch}] 4: [--use-contiguous-buffers-in-ddp] 4: [--no-scatter-gather-tensors-in-pipeline] 4: [--local_rank LOCAL_RANK] 4: [--lazy-mpu-init LAZY_MPU_INIT] 4: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 4: [--eval-interval EVAL_INTERVAL] 4: [--data-path [DATA_PATH ...]] [--split SPLIT] 4: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 4: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 4: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 4: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 4: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 4: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 4: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 4: [--merge-file MERGE_FILE] 4: [--vocab-extra-ids VOCAB_EXTRA_IDS] 4: [--seq-length SEQ_LENGTH] 4: [--encoder-seq-length ENCODER_SEQ_LENGTH] 4: [--decoder-seq-length DECODER_SEQ_LENGTH] 4: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 4: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 4: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 4: [--num-workers NUM_WORKERS] 4: [--valid-num-workers VALID_NUM_WORKERS] 4: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 4: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 4: [--data-impl {lazy,cached,mmap,infer}] 4: [--reset-position-ids] [--reset-attention-mask] 4: [--eod-mask-loss] [--loss-on-targets-only] 4: [--reweight-loss-based-on-position-frequency] 4: [--noise-density NOISE_DENSITY] 4: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 4: [--adlr-autoresume] 4: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 4: [--ict-head-size ICT_HEAD_SIZE] 4: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 4: [--biencoder-shared-query-context-model] 4: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 4: [--titles-data-path TITLES_DATA_PATH] 4: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 4: [--use-one-sent-docs] 4: [--evidence-data-path EVIDENCE_DATA_PATH] 4: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 4: [--retriever-score-scaling] 4: [--block-data-path BLOCK_DATA_PATH] 4: [--embedding-path EMBEDDING_PATH] 4: [--indexer-batch-size INDEXER_BATCH_SIZE] 4: [--indexer-log-interval INDEXER_LOG_INTERVAL] 4: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 4: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 4: [--log-params-norm] [--log-num-zeros-in-grad] 4: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 4: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 4: [--log-timers-to-tensorboard] 4: [--log-batch-size-to-tensorboard] 4: [--no-log-learnig-rate-to-tensorboard] 4: [--no-log-loss-scale-to-tensorboard] 4: [--log-validation-ppl-to-tensorboard] 4: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 4: [--zero-contigious-gradients] 4: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 4: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 4: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 4: [--scattered-embeddings] [--split-transformers] 4: [--memory-centric-tiled-linear] 4: [--tile-factor TILE_FACTOR] 4: [--deepspeed-activation-checkpointing] 4: [--partition-activations] [--contigious-checkpointing] 4: [--checkpoint-in-cpu] [--synchronize-each-layer] 4: [--profile-backward] [--deepspeed] 4: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 4: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 4: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 4: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory 4: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 4: [--hidden-size HIDDEN_SIZE] 4: [--ffn-hidden-size FFN_HIDDEN_SIZE] 4: [--num-attention-heads NUM_ATTENTION_HEADS] 4: [--kv-channels KV_CHANNELS] 4: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 4: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 4: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 4: [--layernorm-epsilon LAYERNORM_EPSILON] 4: [--sync-tp-duplicated-parameters] 4: [--apply-residual-connection-post-layernorm] 4: [--embed-layernorm] [--openai-gelu] 4: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 4: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 4: [--glu-activation {geglu,liglu,reglu,swiglu}] 4: [--kill-switch-path KILL_SWITCH_PATH] 4: [--log-level {debug,info,warning,error,critical}] 4: [--log-level-replica {debug,info,warning,error,critical}] 4: [--attention-dropout ATTENTION_DROPOUT] 4: [--hidden-dropout HIDDEN_DROPOUT] 4: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 4: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 4: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 4: [--micro-batch-size MICRO_BATCH_SIZE] 4: [--batch-size BATCH_SIZE] 4: [--global-batch-size GLOBAL_BATCH_SIZE] 4: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 4: [--checkpoint-activations] 4: [--distribute-checkpointed-activations] 4: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 4: [--train-iters TRAIN_ITERS] 4: [--train-samples TRAIN_SAMPLES] 4: [--train-tokens TRAIN_TOKENS] 4: [--log-interval LOG_INTERVAL] 4: [--exit-interval EXIT_INTERVAL] 4: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 4: [--tensorboard-dir TENSORBOARD_DIR] 4: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 4: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 4: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 4: [--use-bnb-optimizer] 4: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 4: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 4: [--eval-only EVAL_ONLY] 4: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 4: [--inference] 4: [--abort-on-unmet-fused-kernel-constraints] 4: [--pp-partition-method PP_PARTITION_METHOD] 4: [--seed SEED] [--init-method-std INIT_METHOD_STD] 4: [--init-method-xavier-uniform] [--lr LR] 4: [--lr-decay-style {constant,linear,cosine}] 4: [--lr-decay-iters LR_DECAY_ITERS] 4: [--lr-decay-samples LR_DECAY_SAMPLES] 4: [--lr-decay-tokens LR_DECAY_TOKENS] 4: [--lr-warmup-fraction LR_WARMUP_FRACTION] 4: [--lr-warmup-iters LR_WARMUP_ITERS] 4: [--lr-warmup-samples LR_WARMUP_SAMPLES] 4: [--warmup WARMUP] [--min-lr MIN_LR] 4: [--override-lr-scheduler] 4: [--use-checkpoint-lr-scheduler] 4: [--universal-checkpoint] [--save SAVE] 4: [--save-interval SAVE_INTERVAL] [--no-save-optim] 4: [--no-save-rng] [--load LOAD] [--no-load-optim] 4: [--no-load-rng] [--finetune] [--fp16] [--bf16] 4: [--loss-scale LOSS_SCALE] 4: [--initial-loss-scale INITIAL_LOSS_SCALE] 4: [--min-loss-scale MIN_LOSS_SCALE] 4: [--loss-scale-window LOSS_SCALE_WINDOW] 4: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 4: [--no-query-key-layer-scaling] 4: [--attention-softmax-in-fp32] 4: [--accumulate-allreduce-grads-in-fp32] 4: [--fp16-lm-cross-entropy] 4: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 4: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 4: [--model-parallel-size MODEL_PARALLEL_SIZE] 4: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 4: [--distributed-backend {nccl,gloo}] 4: [--DDP-impl {local,torch}] 4: [--use-contiguous-buffers-in-ddp] 4: [--no-scatter-gather-tensors-in-pipeline] 4: [--local_rank LOCAL_RANK] 4: [--lazy-mpu-init LAZY_MPU_INIT] 4: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 4: [--eval-interval EVAL_INTERVAL] 4: [--data-path [DATA_PATH ...]] [--split SPLIT] 4: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 4: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 4: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 4: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 4: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 4: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 4: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 4: [--merge-file MERGE_FILE] 4: [--vocab-extra-ids VOCAB_EXTRA_IDS] 4: [--seq-length SEQ_LENGTH] 4: [--encoder-seq-length ENCODER_SEQ_LENGTH] 4: [--decoder-seq-length DECODER_SEQ_LENGTH] 4: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 4: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 4: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 4: [--num-workers NUM_WORKERS] 4: [--valid-num-workers VALID_NUM_WORKERS] 4: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 4: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 4: [--data-impl {lazy,cached,mmap,infer}] 4: [--reset-position-ids] [--reset-attention-mask] 4: [--eod-mask-loss] [--loss-on-targets-only] 4: [--reweight-loss-based-on-position-frequency] 4: [--noise-density NOISE_DENSITY] 4: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 4: [--adlr-autoresume] 4: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 4: [--ict-head-size ICT_HEAD_SIZE] 4: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 4: [--biencoder-shared-query-context-model] 4: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 4: [--titles-data-path TITLES_DATA_PATH] 4: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 4: [--use-one-sent-docs] 4: [--evidence-data-path EVIDENCE_DATA_PATH] 4: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 4: [--retriever-score-scaling] 4: [--block-data-path BLOCK_DATA_PATH] 4: [--embedding-path EMBEDDING_PATH] 4: [--indexer-batch-size INDEXER_BATCH_SIZE] 4: [--indexer-log-interval INDEXER_LOG_INTERVAL] 4: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 4: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 4: [--log-params-norm] [--log-num-zeros-in-grad] 4: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 4: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 4: [--log-timers-to-tensorboard] 4: [--log-batch-size-to-tensorboard] 4: [--no-log-learnig-rate-to-tensorboard] 4: [--no-log-loss-scale-to-tensorboard] 4: [--log-validation-ppl-to-tensorboard] 4: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 4: [--zero-contigious-gradients] 4: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 4: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 4: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 4: [--scattered-embeddings] [--split-transformers] 4: [--memory-centric-tiled-linear] 4: [--tile-factor TILE_FACTOR] 4: [--deepspeed-activation-checkpointing] 4: [--partition-activations] [--contigious-checkpointing] 4: [--checkpoint-in-cpu] [--synchronize-each-layer] 4: [--profile-backward] [--deepspeed] 4: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 4: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 4: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 4: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 4: [--hidden-size HIDDEN_SIZE] 4: [--ffn-hidden-size FFN_HIDDEN_SIZE] 4: [--num-attention-heads NUM_ATTENTION_HEADS] 4: [--kv-channels KV_CHANNELS] 4: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 4: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 4: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 4: [--layernorm-epsilon LAYERNORM_EPSILON] 4: [--sync-tp-duplicated-parameters] 4: [--apply-residual-connection-post-layernorm] 4: [--embed-layernorm] [--openai-gelu] 4: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 4: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 4: [--glu-activation {geglu,liglu,reglu,swiglu}] 4: [--kill-switch-path KILL_SWITCH_PATH] 4: [--log-level {debug,info,warning,error,critical}] 4: [--log-level-replica {debug,info,warning,error,critical}] 4: [--attention-dropout ATTENTION_DROPOUT] 4: [--hidden-dropout HIDDEN_DROPOUT] 4: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 4: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 4: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 4: [--micro-batch-size MICRO_BATCH_SIZE] 4: [--batch-size BATCH_SIZE] 4: [--global-batch-size GLOBAL_BATCH_SIZE] 4: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 4: [--checkpoint-activations] 4: [--distribute-checkpointed-activations] 4: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 4: [--train-iters TRAIN_ITERS] 4: [--train-samples TRAIN_SAMPLES] 4: [--train-tokens TRAIN_TOKENS] 4: [--log-interval LOG_INTERVAL] 4: [--exit-interval EXIT_INTERVAL] 4: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 4: [--tensorboard-dir TENSORBOARD_DIR] 4: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 4: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 4: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 4: [--use-bnb-optimizer] 4: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 4: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 4: [--eval-only EVAL_ONLY] 4: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 4: [--inference] 4: [--abort-on-unmet-fused-kernel-constraints] 4: [--pp-partition-method PP_PARTITION_METHOD] 4: [--seed SEED] [--init-method-std INIT_METHOD_STD] 4: [--init-method-xavier-uniform] [--lr LR] 4: [--lr-decay-style {constant,linear,cosine}] 4: [--lr-decay-iters LR_DECAY_ITERS] 4: [--lr-decay-samples LR_DECAY_SAMPLES] 4: [--lr-decay-tokens LR_DECAY_TOKENS] 4: [--lr-warmup-fraction LR_WARMUP_FRACTION] 4: [--lr-warmup-iters LR_WARMUP_ITERS] 4: [--lr-warmup-samples LR_WARMUP_SAMPLES] 4: [--warmup WARMUP] [--min-lr MIN_LR] 4: [--override-lr-scheduler] 4: [--use-checkpoint-lr-scheduler] 4: [--universal-checkpoint] [--save SAVE] 4: [--save-interval SAVE_INTERVAL] [--no-save-optim] 4: [--no-save-rng] [--load LOAD] [--no-load-optim] 4: [--no-load-rng] [--finetune] [--fp16] [--bf16] 4: [--loss-scale LOSS_SCALE] 4: [--initial-loss-scale INITIAL_LOSS_SCALE] 4: [--min-loss-scale MIN_LOSS_SCALE] 4: [--loss-scale-window LOSS_SCALE_WINDOW] 4: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 4: [--no-query-key-layer-scaling] 4: [--attention-softmax-in-fp32] 4: [--accumulate-allreduce-grads-in-fp32] 4: [--fp16-lm-cross-entropy] 4: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 4: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 4: [--model-parallel-size MODEL_PARALLEL_SIZE] 4: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 4: [--distributed-backend {nccl,gloo}] 4: [--DDP-impl {local,torch}] 4: [--use-contiguous-buffers-in-ddp] 4: [--no-scatter-gather-tensors-in-pipeline] 4: [--local_rank LOCAL_RANK] 4: [--lazy-mpu-init LAZY_MPU_INIT] 4: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 4: [--eval-interval EVAL_INTERVAL] 4: [--data-path [DATA_PATH ...]] [--split SPLIT] 4: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 4: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 4: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 4: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 4: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 4: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 4: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 4: [--merge-file MERGE_FILE] 4: [--vocab-extra-ids VOCAB_EXTRA_IDS] 4: [--seq-length SEQ_LENGTH] 4: [--encoder-seq-length ENCODER_SEQ_LENGTH] 4: [--decoder-seq-length DECODER_SEQ_LENGTH] 4: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 4: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 4: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 4: [--num-workers NUM_WORKERS] 4: [--valid-num-workers VALID_NUM_WORKERS] 4: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 4: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 4: [--data-impl {lazy,cached,mmap,infer}] 4: [--reset-position-ids] [--reset-attention-mask] 4: [--eod-mask-loss] [--loss-on-targets-only] 4: [--reweight-loss-based-on-position-frequency] 4: [--noise-density NOISE_DENSITY] 4: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 4: [--adlr-autoresume] 4: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 4: [--ict-head-size ICT_HEAD_SIZE] 4: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 4: [--biencoder-shared-query-context-model] 4: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 4: [--titles-data-path TITLES_DATA_PATH] 4: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 4: [--use-one-sent-docs] 4: [--evidence-data-path EVIDENCE_DATA_PATH] 4: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 4: [--retriever-score-scaling] 4: [--block-data-path BLOCK_DATA_PATH] 4: [--embedding-path EMBEDDING_PATH] 4: [--indexer-batch-size INDEXER_BATCH_SIZE] 4: [--indexer-log-interval INDEXER_LOG_INTERVAL] 4: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 4: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 4: [--log-params-norm] [--log-num-zeros-in-grad] 4: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 4: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 4: [--log-timers-to-tensorboard] 4: [--log-batch-size-to-tensorboard] 4: [--no-log-learnig-rate-to-tensorboard] 4: [--no-log-loss-scale-to-tensorboard] 4: [--log-validation-ppl-to-tensorboard] 4: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 4: [--zero-contigious-gradients] 4: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 4: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 4: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 4: [--scattered-embeddings] [--split-transformers] 4: [--memory-centric-tiled-linear] 4: [--tile-factor TILE_FACTOR] 4: [--deepspeed-activation-checkpointing] 4: [--partition-activations] [--contigious-checkpointing] 4: [--checkpoint-in-cpu] [--synchronize-each-layer] 4: [--profile-backward] [--deepspeed] 4: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 4: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 4: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 4: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 4: [--hidden-size HIDDEN_SIZE] 4: [--ffn-hidden-size FFN_HIDDEN_SIZE] 4: [--num-attention-heads NUM_ATTENTION_HEADS] 4: [--kv-channels KV_CHANNELS] 4: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 4: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 4: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 4: [--layernorm-epsilon LAYERNORM_EPSILON] 4: [--sync-tp-duplicated-parameters] 4: [--apply-residual-connection-post-layernorm] 4: [--embed-layernorm] [--openai-gelu] 4: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 4: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 4: [--glu-activation {geglu,liglu,reglu,swiglu}] 4: [--kill-switch-path KILL_SWITCH_PATH] 4: [--log-level {debug,info,warning,error,critical}] 4: [--log-level-replica {debug,info,warning,error,critical}] 4: [--attention-dropout ATTENTION_DROPOUT] 4: [--hidden-dropout HIDDEN_DROPOUT] 4: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 4: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 4: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 4: [--micro-batch-size MICRO_BATCH_SIZE] 4: [--batch-size BATCH_SIZE] 4: [--global-batch-size GLOBAL_BATCH_SIZE] 4: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 4: [--checkpoint-activations] 4: [--distribute-checkpointed-activations] 4: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 4: [--train-iters TRAIN_ITERS] 4: [--train-samples TRAIN_SAMPLES] 4: [--train-tokens TRAIN_TOKENS] 4: [--log-interval LOG_INTERVAL] 4: [--exit-interval EXIT_INTERVAL] 4: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 4: [--tensorboard-dir TENSORBOARD_DIR] 4: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 4: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 4: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 4: [--use-bnb-optimizer] 4: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 4: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 4: [--eval-only EVAL_ONLY] 4: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 4: [--inference] 4: [--abort-on-unmet-fused-kernel-constraints] 4: [--pp-partition-method PP_PARTITION_METHOD] 4: [--seed SEED] [--init-method-std INIT_METHOD_STD] 4: [--init-method-xavier-uniform] [--lr LR] 4: [--lr-decay-style {constant,linear,cosine}] 4: [--lr-decay-iters LR_DECAY_ITERS] 4: [--lr-decay-samples LR_DECAY_SAMPLES] 4: [--lr-decay-tokens LR_DECAY_TOKENS] 4: [--lr-warmup-fraction LR_WARMUP_FRACTION] 4: [--lr-warmup-iters LR_WARMUP_ITERS] 4: [--lr-warmup-samples LR_WARMUP_SAMPLES] 4: [--warmup WARMUP] [--min-lr MIN_LR] 4: [--override-lr-scheduler] 4: [--use-checkpoint-lr-scheduler] 4: [--universal-checkpoint] [--save SAVE] 4: [--save-interval SAVE_INTERVAL] [--no-save-optim] 4: [--no-save-rng] [--load LOAD] [--no-load-optim] 4: [--no-load-rng] [--finetune] [--fp16] [--bf16] 4: [--loss-scale LOSS_SCALE] 4: [--initial-loss-scale INITIAL_LOSS_SCALE] 4: [--min-loss-scale MIN_LOSS_SCALE] 4: [--loss-scale-window LOSS_SCALE_WINDOW] 4: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 4: [--no-query-key-layer-scaling] 4: [--attention-softmax-in-fp32] 4: [--accumulate-allreduce-grads-in-fp32] 4: [--fp16-lm-cross-entropy] 4: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 4: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 4: [--model-parallel-size MODEL_PARALLEL_SIZE] 4: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 4: [--distributed-backend {nccl,gloo}] 4: [--DDP-impl {local,torch}] 4: [--use-contiguous-buffers-in-ddp] 4: [--no-scatter-gather-tensors-in-pipeline] 4: [--local_rank LOCAL_RANK] 4: [--lazy-mpu-init LAZY_MPU_INIT] 4: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 4: [--eval-interval EVAL_INTERVAL] 4: [--data-path [DATA_PATH ...]] [--split SPLIT] 4: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 4: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 4: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 4: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 4: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 4: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 4: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 4: [--merge-file MERGE_FILE] 4: [--vocab-extra-ids VOCAB_EXTRA_IDS] 4: [--seq-length SEQ_LENGTH] 4: [--encoder-seq-length ENCODER_SEQ_LENGTH] 4: [--decoder-seq-length DECODER_SEQ_LENGTH] 4: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 4: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 4: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 4: [--num-workers NUM_WORKERS] 4: [--valid-num-workers VALID_NUM_WORKERS] 4: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 4: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 4: [--data-impl {lazy,cached,mmap,infer}] 4: [--reset-position-ids] [--reset-attention-mask] 4: [--eod-mask-loss] [--loss-on-targets-only] 4: [--reweight-loss-based-on-position-frequency] 4: [--noise-density NOISE_DENSITY] 4: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 4: [--adlr-autoresume] 4: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 4: [--ict-head-size ICT_HEAD_SIZE] 4: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 4: [--biencoder-shared-query-context-model] 4: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 4: [--titles-data-path TITLES_DATA_PATH] 4: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 4: [--use-one-sent-docs] 4: [--evidence-data-path EVIDENCE_DATA_PATH] 4: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 4: [--retriever-score-scaling] 4: [--block-data-path BLOCK_DATA_PATH] 4: [--embedding-path EMBEDDING_PATH] 4: [--indexer-batch-size INDEXER_BATCH_SIZE] 4: [--indexer-log-interval INDEXER_LOG_INTERVAL] 4: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 4: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 4: [--log-params-norm] [--log-num-zeros-in-grad] 4: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 4: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 4: [--log-timers-to-tensorboard] 4: [--log-batch-size-to-tensorboard] 4: [--no-log-learnig-rate-to-tensorboard] 4: [--no-log-loss-scale-to-tensorboard] 4: [--log-validation-ppl-to-tensorboard] 4: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 4: [--zero-contigious-gradients] 4: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 4: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 4: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 4: [--scattered-embeddings] [--split-transformers] 4: [--memory-centric-tiled-linear] 4: [--tile-factor TILE_FACTOR] 4: [--deepspeed-activation-checkpointing] 4: [--partition-activations] [--contigious-checkpointing] 4: [--checkpoint-in-cpu] [--synchronize-each-layer] 4: [--profile-backward] [--deepspeed] 4: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 4: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 4: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 4: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 4: [--hidden-size HIDDEN_SIZE] 4: [--ffn-hidden-size FFN_HIDDEN_SIZE] 4: [--num-attention-heads NUM_ATTENTION_HEADS] 4: [--kv-channels KV_CHANNELS] 4: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 4: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 4: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 4: [--layernorm-epsilon LAYERNORM_EPSILON] 4: [--sync-tp-duplicated-parameters] 4: [--apply-residual-connection-post-layernorm] 4: [--embed-layernorm] [--openai-gelu] 4: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 4: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 4: [--glu-activation {geglu,liglu,reglu,swiglu}] 4: [--kill-switch-path KILL_SWITCH_PATH] 4: [--log-level {debug,info,warning,error,critical}] 4: [--log-level-replica {debug,info,warning,error,critical}] 4: [--attention-dropout ATTENTION_DROPOUT] 4: [--hidden-dropout HIDDEN_DROPOUT] 4: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 4: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 4: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 4: [--micro-batch-size MICRO_BATCH_SIZE] 4: [--batch-size BATCH_SIZE] 4: [--global-batch-size GLOBAL_BATCH_SIZE] 4: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 4: [--checkpoint-activations] 4: [--distribute-checkpointed-activations] 4: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 4: [--train-iters TRAIN_ITERS] 4: [--train-samples TRAIN_SAMPLES] 4: [--train-tokens TRAIN_TOKENS] 4: [--log-interval LOG_INTERVAL] 4: [--exit-interval EXIT_INTERVAL] 4: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 4: [--tensorboard-dir TENSORBOARD_DIR] 4: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 4: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 4: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 4: [--use-bnb-optimizer] 4: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 4: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 4: [--eval-only EVAL_ONLY] 4: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 4: [--inference] 4: [--abort-on-unmet-fused-kernel-constraints] 4: [--pp-partition-method PP_PARTITION_METHOD] 4: [--seed SEED] [--init-method-std INIT_METHOD_STD] 4: [--init-method-xavier-uniform] [--lr LR] 4: [--lr-decay-style {constant,linear,cosine}] 4: [--lr-decay-iters LR_DECAY_ITERS] 4: [--lr-decay-samples LR_DECAY_SAMPLES] 4: [--lr-decay-tokens LR_DECAY_TOKENS] 4: [--lr-warmup-fraction LR_WARMUP_FRACTION] 4: [--lr-warmup-iters LR_WARMUP_ITERS] 4: [--lr-warmup-samples LR_WARMUP_SAMPLES] 4: [--warmup WARMUP] [--min-lr MIN_LR] 4: [--override-lr-scheduler] 4: [--use-checkpoint-lr-scheduler] 4: [--universal-checkpoint] [--save SAVE] 4: [--save-interval SAVE_INTERVAL] [--no-save-optim] 4: [--no-save-rng] [--load LOAD] [--no-load-optim] 4: [--no-load-rng] [--finetune] [--fp16] [--bf16] 4: [--loss-scale LOSS_SCALE] 4: [--initial-loss-scale INITIAL_LOSS_SCALE] 4: [--min-loss-scale MIN_LOSS_SCALE] 4: [--loss-scale-window LOSS_SCALE_WINDOW] 4: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 4: [--no-query-key-layer-scaling] 4: [--attention-softmax-in-fp32] 4: [--accumulate-allreduce-grads-in-fp32] 4: [--fp16-lm-cross-entropy] 4: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 4: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 4: [--model-parallel-size MODEL_PARALLEL_SIZE] 4: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 4: [--distributed-backend {nccl,gloo}] 4: [--DDP-impl {local,torch}] 4: [--use-contiguous-buffers-in-ddp] 4: [--no-scatter-gather-tensors-in-pipeline] 4: [--local_rank LOCAL_RANK] 4: [--lazy-mpu-init LAZY_MPU_INIT] 4: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 4: [--eval-interval EVAL_INTERVAL] 4: [--data-path [DATA_PATH ...]] [--split SPLIT] 4: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 4: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 4: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 4: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 4: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 4: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 4: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 4: [--merge-file MERGE_FILE] 4: [--vocab-extra-ids VOCAB_EXTRA_IDS] 4: [--seq-length SEQ_LENGTH] 4: [--encoder-seq-length ENCODER_SEQ_LENGTH] 4: [--decoder-seq-length DECODER_SEQ_LENGTH] 4: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 4: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 4: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 4: [--num-workers NUM_WORKERS] 4: [--valid-num-workers VALID_NUM_WORKERS] 4: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 4: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 4: [--data-impl {lazy,cached,mmap,infer}] 4: [--reset-position-ids] [--reset-attention-mask] 4: [--eod-mask-loss] [--loss-on-targets-only] 4: [--reweight-loss-based-on-position-frequency] 4: [--noise-density NOISE_DENSITY] 4: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 4: [--adlr-autoresume] 4: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 4: [--ict-head-size ICT_HEAD_SIZE] 4: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 4: [--biencoder-shared-query-context-model] 4: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 4: [--titles-data-path TITLES_DATA_PATH] 4: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 4: [--use-one-sent-docs] 4: [--evidence-data-path EVIDENCE_DATA_PATH] 4: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 4: [--retriever-score-scaling] 4: [--block-data-path BLOCK_DATA_PATH] 4: [--embedding-path EMBEDDING_PATH] 4: [--indexer-batch-size INDEXER_BATCH_SIZE] 4: [--indexer-log-interval INDEXER_LOG_INTERVAL] 4: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 4: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 4: [--log-params-norm] [--log-num-zeros-in-grad] 4: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 4: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 4: [--log-timers-to-tensorboard] 4: [--log-batch-size-to-tensorboard] 4: [--no-log-learnig-rate-to-tensorboard] 4: [--no-log-loss-scale-to-tensorboard] 4: [--log-validation-ppl-to-tensorboard] 4: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 4: [--zero-contigious-gradients] 4: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 4: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 4: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 4: [--scattered-embeddings] [--split-transformers] 4: [--memory-centric-tiled-linear] 4: [--tile-factor TILE_FACTOR] 4: [--deepspeed-activation-checkpointing] 4: [--partition-activations] [--contigious-checkpointing] 4: [--checkpoint-in-cpu] [--synchronize-each-layer] 4: [--profile-backward] [--deepspeed] 4: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 4: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 4: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 4: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 4: [--hidden-size HIDDEN_SIZE] 4: [--ffn-hidden-size FFN_HIDDEN_SIZE] 4: [--num-attention-heads NUM_ATTENTION_HEADS] 4: [--kv-channels KV_CHANNELS] 4: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 4: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 4: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 4: [--layernorm-epsilon LAYERNORM_EPSILON] 4: [--sync-tp-duplicated-parameters] 4: [--apply-residual-connection-post-layernorm] 4: [--embed-layernorm] [--openai-gelu] 4: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 4: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 4: [--glu-activation {geglu,liglu,reglu,swiglu}] 4: [--kill-switch-path KILL_SWITCH_PATH] 4: [--log-level {debug,info,warning,error,critical}] 4: [--log-level-replica {debug,info,warning,error,critical}] 4: [--attention-dropout ATTENTION_DROPOUT] 4: [--hidden-dropout HIDDEN_DROPOUT] 4: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 4: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 4: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 4: [--micro-batch-size MICRO_BATCH_SIZE] 4: [--batch-size BATCH_SIZE] 4: [--global-batch-size GLOBAL_BATCH_SIZE] 4: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 4: [--checkpoint-activations] 4: [--distribute-checkpointed-activations] 4: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 4: [--train-iters TRAIN_ITERS] 4: [--train-samples TRAIN_SAMPLES] 4: [--train-tokens TRAIN_TOKENS] 4: [--log-interval LOG_INTERVAL] 4: [--exit-interval EXIT_INTERVAL] 4: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 4: [--tensorboard-dir TENSORBOARD_DIR] 4: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 4: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 4: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 4: [--use-bnb-optimizer] 4: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 4: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 4: [--eval-only EVAL_ONLY] 4: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 4: [--inference] 4: [--abort-on-unmet-fused-kernel-constraints] 4: [--pp-partition-method PP_PARTITION_METHOD] 4: [--seed SEED] [--init-method-std INIT_METHOD_STD] 4: [--init-method-xavier-uniform] [--lr LR] 4: [--lr-decay-style {constant,linear,cosine}] 4: [--lr-decay-iters LR_DECAY_ITERS] 4: [--lr-decay-samples LR_DECAY_SAMPLES] 4: [--lr-decay-tokens LR_DECAY_TOKENS] 4: [--lr-warmup-fraction LR_WARMUP_FRACTION] 4: [--lr-warmup-iters LR_WARMUP_ITERS] 4: [--lr-warmup-samples LR_WARMUP_SAMPLES] 4: [--warmup WARMUP] [--min-lr MIN_LR] 4: [--override-lr-scheduler] 4: [--use-checkpoint-lr-scheduler] 4: [--universal-checkpoint] [--save SAVE] 4: [--save-interval SAVE_INTERVAL] [--no-save-optim] 4: [--no-save-rng] [--load LOAD] [--no-load-optim] 4: [--no-load-rng] [--finetune] [--fp16] [--bf16] 4: [--loss-scale LOSS_SCALE] 4: [--initial-loss-scale INITIAL_LOSS_SCALE] 4: [--min-loss-scale MIN_LOSS_SCALE] 4: [--loss-scale-window LOSS_SCALE_WINDOW] 4: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 4: [--no-query-key-layer-scaling] 4: [--attention-softmax-in-fp32] 4: [--accumulate-allreduce-grads-in-fp32] 4: [--fp16-lm-cross-entropy] 4: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 4: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 4: [--model-parallel-size MODEL_PARALLEL_SIZE] 4: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 4: [--distributed-backend {nccl,gloo}] 4: [--DDP-impl {local,torch}] 4: [--use-contiguous-buffers-in-ddp] 4: [--no-scatter-gather-tensors-in-pipeline] 4: [--local_rank LOCAL_RANK] 4: [--lazy-mpu-init LAZY_MPU_INIT] 4: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 4: [--eval-interval EVAL_INTERVAL] 4: [--data-path [DATA_PATH ...]] [--split SPLIT] 4: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 4: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 4: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 4: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 4: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 4: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 4: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 4: [--merge-file MERGE_FILE] 4: [--vocab-extra-ids VOCAB_EXTRA_IDS] 4: [--seq-length SEQ_LENGTH] 4: [--encoder-seq-length ENCODER_SEQ_LENGTH] 4: [--decoder-seq-length DECODER_SEQ_LENGTH] 4: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 4: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 4: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 4: [--num-workers NUM_WORKERS] 4: [--valid-num-workers VALID_NUM_WORKERS] 4: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 4: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 4: [--data-impl {lazy,cached,mmap,infer}] 4: [--reset-position-ids] [--reset-attention-mask] 4: [--eod-mask-loss] [--loss-on-targets-only] 4: [--reweight-loss-based-on-position-frequency] 4: [--noise-density NOISE_DENSITY] 4: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 4: [--adlr-autoresume] 4: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 4: [--ict-head-size ICT_HEAD_SIZE] 4: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 4: [--biencoder-shared-query-context-model] 4: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 4: [--titles-data-path TITLES_DATA_PATH] 4: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 4: [--use-one-sent-docs] 4: [--evidence-data-path EVIDENCE_DATA_PATH] 4: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 4: [--retriever-score-scaling] 4: [--block-data-path BLOCK_DATA_PATH] 4: [--embedding-path EMBEDDING_PATH] 4: [--indexer-batch-size INDEXER_BATCH_SIZE] 4: [--indexer-log-interval INDEXER_LOG_INTERVAL] 4: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 4: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 4: [--log-params-norm] [--log-num-zeros-in-grad] 4: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 4: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 4: [--log-timers-to-tensorboard] 4: [--log-batch-size-to-tensorboard] 4: [--no-log-learnig-rate-to-tensorboard] 4: [--no-log-loss-scale-to-tensorboard] 4: [--log-validation-ppl-to-tensorboard] 4: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 4: [--zero-contigious-gradients] 4: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 4: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 4: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 4: [--scattered-embeddings] [--split-transformers] 4: [--memory-centric-tiled-linear] 4: [--tile-factor TILE_FACTOR] 4: [--deepspeed-activation-checkpointing] 4: [--partition-activations] [--contigious-checkpointing] 4: [--checkpoint-in-cpu] [--synchronize-each-layer] 4: [--profile-backward] [--deepspeed] 4: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 4: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 4: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 4: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS] 4: [--hidden-size HIDDEN_SIZE] 4: [--ffn-hidden-size FFN_HIDDEN_SIZE] 4: [--num-attention-heads NUM_ATTENTION_HEADS] 4: [--kv-channels KV_CHANNELS] 4: [--max-position-embeddings MAX_POSITION_EMBEDDINGS] 4: [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] 4: [--pad-vocab-size-to PAD_VOCAB_SIZE_TO] 4: [--layernorm-epsilon LAYERNORM_EPSILON] 4: [--sync-tp-duplicated-parameters] 4: [--apply-residual-connection-post-layernorm] 4: [--embed-layernorm] [--openai-gelu] 4: [--onnx-safe ONNX_SAFE] [--bert-no-binary-head] 4: [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}] 4: [--glu-activation {geglu,liglu,reglu,swiglu}] 4: [--kill-switch-path KILL_SWITCH_PATH] 4: [--log-level {debug,info,warning,error,critical}] 4: [--log-level-replica {debug,info,warning,error,critical}] 4: [--attention-dropout ATTENTION_DROPOUT] 4: [--hidden-dropout HIDDEN_DROPOUT] 4: [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD] 4: [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2] 4: [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM] 4: [--micro-batch-size MICRO_BATCH_SIZE] 4: [--batch-size BATCH_SIZE] 4: [--global-batch-size GLOBAL_BATCH_SIZE] 4: [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]] 4: [--checkpoint-activations] 4: [--distribute-checkpointed-activations] 4: [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] 4: [--train-iters TRAIN_ITERS] 4: [--train-samples TRAIN_SAMPLES] 4: [--train-tokens TRAIN_TOKENS] 4: [--log-interval LOG_INTERVAL] 4: [--exit-interval EXIT_INTERVAL] 4: [--exit-duration-in-mins EXIT_DURATION_IN_MINS] 4: [--tensorboard-dir TENSORBOARD_DIR] 4: [--no-masked-softmax-fusion] [--no-bias-gelu-fusion] 4: [--no-bias-dropout-fusion] [--no-layer-norm-fusion] 4: [--no-optimizer-fusion] [--optimizer {adam,sgd}] 4: [--use-bnb-optimizer] 4: [--dataloader-type {single,cyclic}] [--cpu-optimizer] 4: [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR] 4: [--eval-only EVAL_ONLY] 4: [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]] 4: [--inference] 4: [--abort-on-unmet-fused-kernel-constraints] 4: [--pp-partition-method PP_PARTITION_METHOD] 4: [--seed SEED] [--init-method-std INIT_METHOD_STD] 4: [--init-method-xavier-uniform] [--lr LR] 4: [--lr-decay-style {constant,linear,cosine}] 4: [--lr-decay-iters LR_DECAY_ITERS] 4: [--lr-decay-samples LR_DECAY_SAMPLES] 4: [--lr-decay-tokens LR_DECAY_TOKENS] 4: [--lr-warmup-fraction LR_WARMUP_FRACTION] 4: [--lr-warmup-iters LR_WARMUP_ITERS] 4: [--lr-warmup-samples LR_WARMUP_SAMPLES] 4: [--warmup WARMUP] [--min-lr MIN_LR] 4: [--override-lr-scheduler] 4: [--use-checkpoint-lr-scheduler] 4: [--universal-checkpoint] [--save SAVE] 4: [--save-interval SAVE_INTERVAL] [--no-save-optim] 4: [--no-save-rng] [--load LOAD] [--no-load-optim] 4: [--no-load-rng] [--finetune] [--fp16] [--bf16] 4: [--loss-scale LOSS_SCALE] 4: [--initial-loss-scale INITIAL_LOSS_SCALE] 4: [--min-loss-scale MIN_LOSS_SCALE] 4: [--loss-scale-window LOSS_SCALE_WINDOW] 4: [--hysteresis HYSTERESIS] [--fp32-residual-connection] 4: [--no-query-key-layer-scaling] 4: [--attention-softmax-in-fp32] 4: [--accumulate-allreduce-grads-in-fp32] 4: [--fp16-lm-cross-entropy] 4: [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE] 4: [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE] 4: [--model-parallel-size MODEL_PARALLEL_SIZE] 4: [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE] 4: [--distributed-backend {nccl,gloo}] 4: [--DDP-impl {local,torch}] 4: [--use-contiguous-buffers-in-ddp] 4: [--no-scatter-gather-tensors-in-pipeline] 4: [--local_rank LOCAL_RANK] 4: [--lazy-mpu-init LAZY_MPU_INIT] 4: [--use-cpu-initialization] [--eval-iters EVAL_ITERS] 4: [--eval-interval EVAL_INTERVAL] 4: [--data-path [DATA_PATH ...]] [--split SPLIT] 4: [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]] 4: [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]] 4: [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]] 4: [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH] 4: [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH] 4: [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH] 4: [--log-path LOG_PATH] [--vocab-file VOCAB_FILE] 4: [--merge-file MERGE_FILE] 4: [--vocab-extra-ids VOCAB_EXTRA_IDS] 4: [--seq-length SEQ_LENGTH] 4: [--encoder-seq-length ENCODER_SEQ_LENGTH] 4: [--decoder-seq-length DECODER_SEQ_LENGTH] 4: [--retriever-seq-length RETRIEVER_SEQ_LENGTH] 4: [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB] 4: [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup] 4: [--num-workers NUM_WORKERS] 4: [--valid-num-workers VALID_NUM_WORKERS] 4: [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}] 4: [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH] 4: [--data-impl {lazy,cached,mmap,infer}] 4: [--reset-position-ids] [--reset-attention-mask] 4: [--eod-mask-loss] [--loss-on-targets-only] 4: [--reweight-loss-based-on-position-frequency] 4: [--noise-density NOISE_DENSITY] 4: [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH] 4: [--adlr-autoresume] 4: [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL] 4: [--ict-head-size ICT_HEAD_SIZE] 4: [--biencoder-projection-dim BIENCODER_PROJECTION_DIM] 4: [--biencoder-shared-query-context-model] 4: [--ict-load ICT_LOAD] [--bert-load BERT_LOAD] 4: [--titles-data-path TITLES_DATA_PATH] 4: [--query-in-block-prob QUERY_IN_BLOCK_PROB] 4: [--use-one-sent-docs] 4: [--evidence-data-path EVIDENCE_DATA_PATH] 4: [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]] 4: [--retriever-score-scaling] 4: [--block-data-path BLOCK_DATA_PATH] 4: [--embedding-path EMBEDDING_PATH] 4: [--indexer-batch-size INDEXER_BATCH_SIZE] 4: [--indexer-log-interval INDEXER_LOG_INTERVAL] 4: [--num-classes NUM_CLASSES] [--img-dim IMG_DIM] 4: [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM] 4: [--log-params-norm] [--log-num-zeros-in-grad] 4: [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL] 4: [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE] 4: [--log-timers-to-tensorboard] 4: [--log-batch-size-to-tensorboard] 4: [--no-log-learnig-rate-to-tensorboard] 4: [--no-log-loss-scale-to-tensorboard] 4: [--log-validation-ppl-to-tensorboard] 4: [--zero-stage ZERO_STAGE] [--zero-reduce-scatter] 4: [--zero-contigious-gradients] 4: [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE] 4: [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE] 4: [--remote-device {none,cpu,nvme}] [--use-pin-memory] 4: [--scattered-embeddings] [--split-transformers] 4: [--memory-centric-tiled-linear] 4: [--tile-factor TILE_FACTOR] 4: [--deepspeed-activation-checkpointing] 4: [--partition-activations] [--contigious-checkpointing] 4: [--checkpoint-in-cpu] [--synchronize-each-layer] 4: [--profile-backward] [--deepspeed] 4: [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] 4: [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi] 4: pretrain_gpt.py: error: unrecognized arguments: --reset-progress 2: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 68510) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python 6: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 2252) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python 5: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 40052) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python 7: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 58809) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python 0: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 55346) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python 1: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 52416) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python 4: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 120066) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python 3: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 12206) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python 3: Traceback (most recent call last): 3: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main 3: return _run_code(code, main_globals, None, 3: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code 3: exec(code, run_globals) 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in 2: Traceback (most recent call last): 2: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main 3: main() 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper 2: return _run_code(code, main_globals, None, 2: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code 2: exec(code, run_globals) 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in 1: Traceback (most recent call last): 1: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main 3: return f(*args, **kwargs) 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main 1: return _run_code(code, main_globals, None, 1: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code 1: exec(code, run_globals) 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in 5: Traceback (most recent call last): 5: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main 3: run(args) 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run 5: return _run_code(code, main_globals, None, 5: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code 0: Traceback (most recent call last): 0: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main 5: exec(code, run_globals) 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in 0: return _run_code(code, main_globals, None, 0: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code 4: Traceback (most recent call last): 4: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main 0: exec(code, run_globals) 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in 3: elastic_launch( 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ 4: return _run_code(code, main_globals, None, 4: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code 2: main() 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper 4: exec(code, run_globals) 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in 1: main() 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper 7: Traceback (most recent call last): 7: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main 2: return f(*args, **kwargs) 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main 7: return _run_code(code, main_globals, None, 7: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code 7: exec(code, run_globals) 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in 1: return f(*args, **kwargs) 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main 3: return launch_agent(self._config, self._entrypoint, list(args)) 3: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent 5: main() 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper 3: raise ChildFailedError( 3: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 3: ============================================================ 3: Megatron-DeepSpeed/pretrain_gpt.py FAILED 3: ------------------------------------------------------------ 3: Failures: 3: [1]: 3: time : 2023-04-24_12:09:29 3: host : nid006911 3: rank : 25 (local_rank: 1) 3: exitcode : 2 (pid: 12207) 3: error_file: 3: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 3: [2]: 3: time : 2023-04-24_12:09:29 3: host : nid006911 3: rank : 26 (local_rank: 2) 3: exitcode : 2 (pid: 12208) 3: error_file: 3: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 3: [3]: 3: time : 2023-04-24_12:09:29 3: host : nid006911 3: rank : 27 (local_rank: 3) 3: exitcode : 2 (pid: 12209) 3: error_file: 3: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 3: [4]: 3: time : 2023-04-24_12:09:29 2: run(args) 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run 0: main() 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper 1: run(args) 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run 3: host : nid006911 3: rank : 28 (local_rank: 4) 3: exitcode : 2 (pid: 12210) 3: error_file: 3: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 3: [5]: 3: time : 2023-04-24_12:09:29 3: host : nid006911 3: rank : 29 (local_rank: 5) 3: exitcode : 2 (pid: 12211) 3: error_file: 3: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 3: [6]: 3: time : 2023-04-24_12:09:29 3: host : nid006911 3: rank : 30 (local_rank: 6) 3: exitcode : 2 (pid: 12212) 3: error_file: 3: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 3: [7]: 3: time : 2023-04-24_12:09:29 3: host : nid006911 3: rank : 31 (local_rank: 7) 3: exitcode : 2 (pid: 12213) 3: error_file: 3: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 3: ------------------------------------------------------------ 3: Root Cause (first observed failure): 3: [0]: 4: main() 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper 5: return f(*args, **kwargs) 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main 3: time : 2023-04-24_12:09:29 3: host : nid006911 3: rank : 24 (local_rank: 0) 3: exitcode : 2 (pid: 12206) 3: error_file: 3: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 3: ============================================================ 4: return f(*args, **kwargs) 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main 0: return f(*args, **kwargs) 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main 1: elastic_launch( 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ 2: elastic_launch( 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ 7: main() 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper 5: run(args) 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run 7: return f(*args, **kwargs) 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main 4: run(args) 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run 2: return launch_agent(self._config, self._entrypoint, list(args)) 2: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent 5: elastic_launch( 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ 1: return launch_agent(self._config, self._entrypoint, list(args)) 1: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent 6: Traceback (most recent call last): 6: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main 0: run(args) 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run 4: elastic_launch( 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ 6: return _run_code(code, main_globals, None, 6: File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code 6: exec(code, run_globals) 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in 1: raise ChildFailedError( 1: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 1: ============================================================ 1: Megatron-DeepSpeed/pretrain_gpt.py FAILED 1: ------------------------------------------------------------ 1: Failures: 1: [1]: 1: time : 2023-04-24_12:09:29 1: host : nid006909 1: rank : 9 (local_rank: 1) 1: exitcode : 2 (pid: 52417) 1: error_file: 1: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 1: [2]: 1: time : 2023-04-24_12:09:29 1: host : nid006909 1: rank : 10 (local_rank: 2) 1: exitcode : 2 (pid: 52418) 1: error_file: 1: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 1: [3]: 1: time : 2023-04-24_12:09:29 1: host : nid006909 1: rank : 11 (local_rank: 3) 1: exitcode : 2 (pid: 52419) 1: error_file: 1: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 1: [4]: 1: time : 2023-04-24_12:09:29 7: run(args) 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run 2: raise ChildFailedError( 2: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 2: ============================================================ 2: Megatron-DeepSpeed/pretrain_gpt.py FAILED 2: ------------------------------------------------------------ 2: Failures: 2: [1]: 2: time : 2023-04-24_12:09:29 2: host : nid006910 2: rank : 17 (local_rank: 1) 2: exitcode : 2 (pid: 68511) 2: error_file: 2: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 2: [2]: 2: time : 2023-04-24_12:09:29 2: host : nid006910 2: rank : 18 (local_rank: 2) 2: exitcode : 2 (pid: 68512) 2: error_file: 2: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 2: [3]: 2: time : 2023-04-24_12:09:29 2: host : nid006910 2: rank : 19 (local_rank: 3) 2: exitcode : 2 (pid: 68513) 2: error_file: 2: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 2: [4]: 2: time : 2023-04-24_12:09:29 0: elastic_launch( 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ 1: host : nid006909 1: rank : 12 (local_rank: 4) 1: exitcode : 2 (pid: 52420) 1: error_file: 1: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 1: [5]: 1: time : 2023-04-24_12:09:29 1: host : nid006909 1: rank : 13 (local_rank: 5) 1: exitcode : 2 (pid: 52421) 1: error_file: 1: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 1: [6]: 1: time : 2023-04-24_12:09:29 1: host : nid006909 1: rank : 14 (local_rank: 6) 1: exitcode : 2 (pid: 52422) 1: error_file: 1: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 1: [7]: 1: time : 2023-04-24_12:09:29 1: host : nid006909 1: rank : 15 (local_rank: 7) 1: exitcode : 2 (pid: 52423) 1: error_file: 1: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 1: ------------------------------------------------------------ 1: Root Cause (first observed failure): 1: [0]: 5: return launch_agent(self._config, self._entrypoint, list(args)) 5: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent 2: host : nid006910 2: rank : 20 (local_rank: 4) 2: exitcode : 2 (pid: 68514) 2: error_file: 2: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 2: [5]: 2: time : 2023-04-24_12:09:29 2: host : nid006910 2: rank : 21 (local_rank: 5) 2: exitcode : 2 (pid: 68515) 2: error_file: 2: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 2: [6]: 2: time : 2023-04-24_12:09:29 2: host : nid006910 2: rank : 22 (local_rank: 6) 2: exitcode : 2 (pid: 68516) 2: error_file: 2: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 2: [7]: 2: time : 2023-04-24_12:09:29 2: host : nid006910 2: rank : 23 (local_rank: 7) 2: exitcode : 2 (pid: 68517) 2: error_file: 2: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 2: ------------------------------------------------------------ 2: Root Cause (first observed failure): 2: [0]: 1: time : 2023-04-24_12:09:29 1: host : nid006909 1: rank : 8 (local_rank: 0) 1: exitcode : 2 (pid: 52416) 1: error_file: 1: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 1: ============================================================ 2: time : 2023-04-24_12:09:29 2: host : nid006910 2: rank : 16 (local_rank: 0) 2: exitcode : 2 (pid: 68510) 2: error_file: 2: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 2: ============================================================ 7: elastic_launch( 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ 4: return launch_agent(self._config, self._entrypoint, list(args)) 4: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent 0: return launch_agent(self._config, self._entrypoint, list(args)) 0: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent 5: raise ChildFailedError( 5: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 5: ============================================================ 5: Megatron-DeepSpeed/pretrain_gpt.py FAILED 5: ------------------------------------------------------------ 5: Failures: 5: [1]: 5: time : 2023-04-24_12:09:29 5: host : nid006913 5: rank : 41 (local_rank: 1) 5: exitcode : 2 (pid: 40053) 5: error_file: 5: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 5: [2]: 5: time : 2023-04-24_12:09:29 5: host : nid006913 5: rank : 42 (local_rank: 2) 5: exitcode : 2 (pid: 40054) 5: error_file: 5: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 5: [3]: 5: time : 2023-04-24_12:09:29 5: host : nid006913 5: rank : 43 (local_rank: 3) 5: exitcode : 2 (pid: 40055) 5: error_file: 5: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 5: [4]: 5: time : 2023-04-24_12:09:29 4: raise ChildFailedError( 4: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 4: ============================================================ 4: Megatron-DeepSpeed/pretrain_gpt.py FAILED 4: ------------------------------------------------------------ 4: Failures: 4: [1]: 4: time : 2023-04-24_12:09:29 4: host : nid006912 4: rank : 33 (local_rank: 1) 4: exitcode : 2 (pid: 120067) 4: error_file: 4: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 4: [2]: 4: time : 2023-04-24_12:09:29 4: host : nid006912 4: rank : 34 (local_rank: 2) 4: exitcode : 2 (pid: 120068) 4: error_file: 4: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 4: [3]: 4: time : 2023-04-24_12:09:29 4: host : nid006912 4: rank : 35 (local_rank: 3) 4: exitcode : 2 (pid: 120069) 4: error_file: 4: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 4: [4]: 4: time : 2023-04-24_12:09:29 5: host : nid006913 5: rank : 44 (local_rank: 4) 5: exitcode : 2 (pid: 40056) 5: error_file: 5: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 5: [5]: 5: time : 2023-04-24_12:09:29 5: host : nid006913 5: rank : 45 (local_rank: 5) 5: exitcode : 2 (pid: 40057) 5: error_file: 5: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 5: [6]: 5: time : 2023-04-24_12:09:29 5: host : nid006913 5: rank : 46 (local_rank: 6) 5: exitcode : 2 (pid: 40058) 5: error_file: 5: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 5: [7]: 5: time : 2023-04-24_12:09:29 5: host : nid006913 5: rank : 47 (local_rank: 7) 5: exitcode : 2 (pid: 40059) 5: error_file: 5: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 5: ------------------------------------------------------------ 5: Root Cause (first observed failure): 5: [0]: 7: return launch_agent(self._config, self._entrypoint, list(args)) 7: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent 4: host : nid006912 4: rank : 36 (local_rank: 4) 4: exitcode : 2 (pid: 120076) 4: error_file: 4: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 4: [5]: 4: time : 2023-04-24_12:09:29 4: host : nid006912 4: rank : 37 (local_rank: 5) 4: exitcode : 2 (pid: 120078) 4: error_file: 4: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 4: [6]: 4: time : 2023-04-24_12:09:29 4: host : nid006912 4: rank : 38 (local_rank: 6) 4: exitcode : 2 (pid: 120079) 4: error_file: 4: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 4: [7]: 4: time : 2023-04-24_12:09:29 4: host : nid006912 4: rank : 39 (local_rank: 7) 4: exitcode : 2 (pid: 120080) 4: error_file: 4: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 4: ------------------------------------------------------------ 4: Root Cause (first observed failure): 4: [0]: 5: time : 2023-04-24_12:09:29 5: host : nid006913 5: rank : 40 (local_rank: 0) 5: exitcode : 2 (pid: 40052) 5: error_file: 5: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 5: ============================================================ 0: raise ChildFailedError( 0: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 0: ============================================================ 0: Megatron-DeepSpeed/pretrain_gpt.py FAILED 0: ------------------------------------------------------------ 0: Failures: 0: [1]: 0: time : 2023-04-24_12:09:29 0: host : nid006908 0: rank : 1 (local_rank: 1) 0: exitcode : 2 (pid: 55347) 0: error_file: 0: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 0: [2]: 0: time : 2023-04-24_12:09:29 0: host : nid006908 0: rank : 2 (local_rank: 2) 0: exitcode : 2 (pid: 55348) 0: error_file: 0: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 0: [3]: 0: time : 2023-04-24_12:09:29 0: host : nid006908 0: rank : 3 (local_rank: 3) 0: exitcode : 2 (pid: 55349) 0: error_file: 0: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 0: [4]: 0: time : 2023-04-24_12:09:29 4: time : 2023-04-24_12:09:29 4: host : nid006912 4: rank : 32 (local_rank: 0) 4: exitcode : 2 (pid: 120066) 4: error_file: 4: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 4: ============================================================ 6: main() 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper 0: host : nid006908 0: rank : 4 (local_rank: 4) 0: exitcode : 2 (pid: 55350) 0: error_file: 0: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 0: [5]: 0: time : 2023-04-24_12:09:29 0: host : nid006908 0: rank : 5 (local_rank: 5) 0: exitcode : 2 (pid: 55351) 0: error_file: 0: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 0: [6]: 0: time : 2023-04-24_12:09:29 0: host : nid006908 0: rank : 6 (local_rank: 6) 0: exitcode : 2 (pid: 55352) 0: error_file: 0: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 0: [7]: 0: time : 2023-04-24_12:09:29 0: host : nid006908 0: rank : 7 (local_rank: 7) 0: exitcode : 2 (pid: 55353) 0: error_file: 0: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 0: ------------------------------------------------------------ 0: Root Cause (first observed failure): 0: [0]: 0: time : 2023-04-24_12:09:29 0: host : nid006908 0: rank : 0 (local_rank: 0) 0: exitcode : 2 (pid: 55346) 0: error_file: 0: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 0: ============================================================ 7: raise ChildFailedError( 7: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 7: ============================================================ 7: Megatron-DeepSpeed/pretrain_gpt.py FAILED 7: ------------------------------------------------------------ 7: Failures: 7: [1]: 7: time : 2023-04-24_12:09:29 7: host : nid006915 7: rank : 57 (local_rank: 1) 7: exitcode : 2 (pid: 58810) 7: error_file: 7: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 7: [2]: 7: time : 2023-04-24_12:09:29 7: host : nid006915 7: rank : 58 (local_rank: 2) 7: exitcode : 2 (pid: 58811) 7: error_file: 7: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 7: [3]: 7: time : 2023-04-24_12:09:29 7: host : nid006915 7: rank : 59 (local_rank: 3) 7: exitcode : 2 (pid: 58812) 7: error_file: 7: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 7: [4]: 7: time : 2023-04-24_12:09:29 6: return f(*args, **kwargs) 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main 7: host : nid006915 7: rank : 60 (local_rank: 4) 7: exitcode : 2 (pid: 58813) 7: error_file: 7: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 7: [5]: 7: time : 2023-04-24_12:09:29 7: host : nid006915 7: rank : 61 (local_rank: 5) 7: exitcode : 2 (pid: 58814) 7: error_file: 7: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 7: [6]: 7: time : 2023-04-24_12:09:29 7: host : nid006915 7: rank : 62 (local_rank: 6) 7: exitcode : 2 (pid: 58815) 7: error_file: 7: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 7: [7]: 7: time : 2023-04-24_12:09:29 7: host : nid006915 7: rank : 63 (local_rank: 7) 7: exitcode : 2 (pid: 58816) 7: error_file: 7: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 7: ------------------------------------------------------------ 7: Root Cause (first observed failure): 7: [0]: 7: time : 2023-04-24_12:09:29 7: host : nid006915 7: rank : 56 (local_rank: 0) 7: exitcode : 2 (pid: 58809) 7: error_file: 7: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 7: ============================================================ 6: run(args) 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run 6: elastic_launch( 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ 6: return launch_agent(self._config, self._entrypoint, list(args)) 6: File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent 6: raise ChildFailedError( 6: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 6: ============================================================ 6: Megatron-DeepSpeed/pretrain_gpt.py FAILED 6: ------------------------------------------------------------ 6: Failures: 6: [1]: 6: time : 2023-04-24_12:09:29 6: host : nid006914 6: rank : 49 (local_rank: 1) 6: exitcode : 2 (pid: 2253) 6: error_file: 6: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 6: [2]: 6: time : 2023-04-24_12:09:29 6: host : nid006914 6: rank : 50 (local_rank: 2) 6: exitcode : 2 (pid: 2254) 6: error_file: 6: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 6: [3]: 6: time : 2023-04-24_12:09:29 6: host : nid006914 6: rank : 51 (local_rank: 3) 6: exitcode : 2 (pid: 2255) 6: error_file: 6: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 6: [4]: 6: time : 2023-04-24_12:09:29 6: host : nid006914 6: rank : 52 (local_rank: 4) 6: exitcode : 2 (pid: 2256) 6: error_file: 6: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 6: [5]: 6: time : 2023-04-24_12:09:29 6: host : nid006914 6: rank : 53 (local_rank: 5) 6: exitcode : 2 (pid: 2258) 6: error_file: 6: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 6: [6]: 6: time : 2023-04-24_12:09:29 6: host : nid006914 6: rank : 54 (local_rank: 6) 6: exitcode : 2 (pid: 2259) 6: error_file: 6: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 6: [7]: 6: time : 2023-04-24_12:09:29 6: host : nid006914 6: rank : 55 (local_rank: 7) 6: exitcode : 2 (pid: 2260) 6: error_file: 6: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 6: ------------------------------------------------------------ 6: Root Cause (first observed failure): 6: [0]: 6: time : 2023-04-24_12:09:29 6: host : nid006914 6: rank : 48 (local_rank: 0) 6: exitcode : 2 (pid: 2252) 6: error_file: 6: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html 6: ============================================================ srun: error: nid006915: task 7: Exited with exit code 1 srun: launch/slurm: _step_signal: Terminating StepId=3406547.0 srun: error: nid006911: task 3: Exited with exit code 1 srun: error: nid006912: task 4: Exited with exit code 1 srun: error: nid006909: task 1: Exited with exit code 1 srun: error: nid006913: task 5: Exited with exit code 1 0: slurmstepd: error: *** STEP 3406547.0 ON nid006908 CANCELLED AT 2023-04-24T12:09:30 *** srun: error: nid006910: task 2: Exited with exit code 1 srun: error: nid006914: task 6: Exited with exit code 1 srun: error: nid006908: task 0: Terminated srun: Force Terminated StepId=3406547.0