4: Lmod has detected the following error: The following module(s) are unknown:
4: "suse-repo-deps/sam-default"
4: 
4: Please check the spelling or version number. Also try "module spider ..."
4: It is also possible your cache file is out-of-date; it may help to try:
4:   $ module --ignore-cache load "suse-repo-deps/sam-default"
4: 
4: Also make sure that all modulefiles written in TCL start with the string
4: #%Module
4: 
4: 
4: 
0: Lmod has detected the following error: The following module(s) are unknown:
0: "suse-repo-deps/sam-default"
0: 
0: Please check the spelling or version number. Also try "module spider ..."
0: It is also possible your cache file is out-of-date; it may help to try:
0:   $ module --ignore-cache load "suse-repo-deps/sam-default"
0: 
0: Also make sure that all modulefiles written in TCL start with the string
0: #%Module
0: 
0: 
0: 
4: Lmod has detected the following error: The following module(s) are unknown:
4: "rocm/sam-5.2.3"
4: 
4: Please check the spelling or version number. Also try "module spider ..."
4: It is also possible your cache file is out-of-date; it may help to try:
4:   $ module --ignore-cache load "rocm/sam-5.2.3"
4: 
4: Also make sure that all modulefiles written in TCL start with the string
4: #%Module
4: 
4: 
4: 
0: Lmod has detected the following error: The following module(s) are unknown:
0: "rocm/sam-5.2.3"
0: 
0: Please check the spelling or version number. Also try "module spider ..."
0: It is also possible your cache file is out-of-date; it may help to try:
0:   $ module --ignore-cache load "rocm/sam-5.2.3"
0: 
0: Also make sure that all modulefiles written in TCL start with the string
0: #%Module
0: 
0: 
0: 
4: Lmod has detected the following error: The following module(s) are unknown:
4: "rccl/sam-develop"
4: 
4: Please check the spelling or version number. Also try "module spider ..."
4: It is also possible your cache file is out-of-date; it may help to try:
4:   $ module --ignore-cache load "rccl/sam-develop"
4: 
4: Also make sure that all modulefiles written in TCL start with the string
4: #%Module
4: 
4: 
4: 
0: Lmod has detected the following error: The following module(s) are unknown:
0: "rccl/sam-develop"
0: 
0: Please check the spelling or version number. Also try "module spider ..."
0: It is also possible your cache file is out-of-date; it may help to try:
0:   $ module --ignore-cache load "rccl/sam-develop"
0: 
0: Also make sure that all modulefiles written in TCL start with the string
0: #%Module
0: 
0: 
0: 
4: Lmod has detected the following error: The following module(s) are unknown:
4: "aws-ofi-rccl/sam-default"
4: 
4: Please check the spelling or version number. Also try "module spider ..."
4: It is also possible your cache file is out-of-date; it may help to try:
4:   $ module --ignore-cache load "aws-ofi-rccl/sam-default"
4: 
4: Also make sure that all modulefiles written in TCL start with the string
4: #%Module
4: 
4: 
4: 
0: Lmod has detected the following error: The following module(s) are unknown:
0: "aws-ofi-rccl/sam-default"
0: 
0: Please check the spelling or version number. Also try "module spider ..."
0: It is also possible your cache file is out-of-date; it may help to try:
0:   $ module --ignore-cache load "aws-ofi-rccl/sam-default"
0: 
0: Also make sure that all modulefiles written in TCL start with the string
0: #%Module
0: 
0: 
0: 
1: Lmod has detected the following error: The following module(s) are unknown:
1: "suse-repo-deps/sam-default"
1: 
1: Please check the spelling or version number. Also try "module spider ..."
1: It is also possible your cache file is out-of-date; it may help to try:
1:   $ module --ignore-cache load "suse-repo-deps/sam-default"
1: 
1: Also make sure that all modulefiles written in TCL start with the string
1: #%Module
1: 
1: 
1: 
5: Lmod has detected the following error: The following module(s) are unknown:
5: "suse-repo-deps/sam-default"
5: 
5: Please check the spelling or version number. Also try "module spider ..."
5: It is also possible your cache file is out-of-date; it may help to try:
5:   $ module --ignore-cache load "suse-repo-deps/sam-default"
5: 
5: Also make sure that all modulefiles written in TCL start with the string
5: #%Module
5: 
5: 
5: 
1: Lmod has detected the following error: The following module(s) are unknown:
1: "rocm/sam-5.2.3"
1: 
1: Please check the spelling or version number. Also try "module spider ..."
1: It is also possible your cache file is out-of-date; it may help to try:
1:   $ module --ignore-cache load "rocm/sam-5.2.3"
1: 
1: Also make sure that all modulefiles written in TCL start with the string
1: #%Module
1: 
1: 
1: 
5: Lmod has detected the following error: The following module(s) are unknown:
5: "rocm/sam-5.2.3"
5: 
5: Please check the spelling or version number. Also try "module spider ..."
5: It is also possible your cache file is out-of-date; it may help to try:
5:   $ module --ignore-cache load "rocm/sam-5.2.3"
5: 
5: Also make sure that all modulefiles written in TCL start with the string
5: #%Module
5: 
5: 
5: 
1: Lmod has detected the following error: The following module(s) are unknown:
1: "rccl/sam-develop"
1: 
1: Please check the spelling or version number. Also try "module spider ..."
1: It is also possible your cache file is out-of-date; it may help to try:
1:   $ module --ignore-cache load "rccl/sam-develop"
1: 
1: Also make sure that all modulefiles written in TCL start with the string
1: #%Module
1: 
1: 
1: 
5: Lmod has detected the following error: The following module(s) are unknown:
5: "rccl/sam-develop"
5: 
5: Please check the spelling or version number. Also try "module spider ..."
5: It is also possible your cache file is out-of-date; it may help to try:
5:   $ module --ignore-cache load "rccl/sam-develop"
5: 
5: Also make sure that all modulefiles written in TCL start with the string
5: #%Module
5: 
5: 
5: 
1: Lmod has detected the following error: The following module(s) are unknown:
1: "aws-ofi-rccl/sam-default"
1: 
1: Please check the spelling or version number. Also try "module spider ..."
1: It is also possible your cache file is out-of-date; it may help to try:
1:   $ module --ignore-cache load "aws-ofi-rccl/sam-default"
1: 
1: Also make sure that all modulefiles written in TCL start with the string
1: #%Module
1: 
1: 
1: 
5: Lmod has detected the following error: The following module(s) are unknown:
5: "aws-ofi-rccl/sam-default"
5: 
5: Please check the spelling or version number. Also try "module spider ..."
5: It is also possible your cache file is out-of-date; it may help to try:
5:   $ module --ignore-cache load "aws-ofi-rccl/sam-default"
5: 
5: Also make sure that all modulefiles written in TCL start with the string
5: #%Module
5: 
5: 
5: 
7: Lmod has detected the following error: The following module(s) are unknown:
7: "suse-repo-deps/sam-default"
7: 
7: Please check the spelling or version number. Also try "module spider ..."
7: It is also possible your cache file is out-of-date; it may help to try:
7:   $ module --ignore-cache load "suse-repo-deps/sam-default"
7: 
7: Also make sure that all modulefiles written in TCL start with the string
7: #%Module
7: 
7: 
7: 
7: Lmod has detected the following error: The following module(s) are unknown:
7: "rocm/sam-5.2.3"
7: 
7: Please check the spelling or version number. Also try "module spider ..."
7: It is also possible your cache file is out-of-date; it may help to try:
7:   $ module --ignore-cache load "rocm/sam-5.2.3"
7: 
7: Also make sure that all modulefiles written in TCL start with the string
7: #%Module
7: 
7: 
7: 
7: Lmod has detected the following error: The following module(s) are unknown:
7: "rccl/sam-develop"
7: 
7: Please check the spelling or version number. Also try "module spider ..."
7: It is also possible your cache file is out-of-date; it may help to try:
7:   $ module --ignore-cache load "rccl/sam-develop"
7: 
7: Also make sure that all modulefiles written in TCL start with the string
7: #%Module
7: 
7: 
7: 
6: Lmod has detected the following error: The following module(s) are unknown:
6: "suse-repo-deps/sam-default"
6: 
6: Please check the spelling or version number. Also try "module spider ..."
6: It is also possible your cache file is out-of-date; it may help to try:
6:   $ module --ignore-cache load "suse-repo-deps/sam-default"
6: 
6: Also make sure that all modulefiles written in TCL start with the string
6: #%Module
6: 
6: 
6: 
6: Lmod has detected the following error: The following module(s) are unknown:
6: "rocm/sam-5.2.3"
6: 
6: Please check the spelling or version number. Also try "module spider ..."
6: It is also possible your cache file is out-of-date; it may help to try:
6:   $ module --ignore-cache load "rocm/sam-5.2.3"
6: 
6: Also make sure that all modulefiles written in TCL start with the string
6: #%Module
6: 
6: 
6: 
6: Lmod has detected the following error: The following module(s) are unknown:
6: "rccl/sam-develop"
6: 
6: Please check the spelling or version number. Also try "module spider ..."
6: It is also possible your cache file is out-of-date; it may help to try:
6:   $ module --ignore-cache load "rccl/sam-develop"
6: 
6: Also make sure that all modulefiles written in TCL start with the string
6: #%Module
6: 
6: 
6: 
3: Lmod has detected the following error: The following module(s) are unknown:
3: "suse-repo-deps/sam-default"
3: 
3: Please check the spelling or version number. Also try "module spider ..."
3: It is also possible your cache file is out-of-date; it may help to try:
3:   $ module --ignore-cache load "suse-repo-deps/sam-default"
3: 
3: Also make sure that all modulefiles written in TCL start with the string
3: #%Module
3: 
3: 
3: 
6: Lmod has detected the following error: The following module(s) are unknown:
6: "aws-ofi-rccl/sam-default"
6: 
6: Please check the spelling or version number. Also try "module spider ..."
6: It is also possible your cache file is out-of-date; it may help to try:
6:   $ module --ignore-cache load "aws-ofi-rccl/sam-default"
6: 
6: Also make sure that all modulefiles written in TCL start with the string
6: #%Module
6: 
6: 
6: 
3: Lmod has detected the following error: The following module(s) are unknown:
3: "rocm/sam-5.2.3"
3: 
3: Please check the spelling or version number. Also try "module spider ..."
3: It is also possible your cache file is out-of-date; it may help to try:
3:   $ module --ignore-cache load "rocm/sam-5.2.3"
3: 
3: Also make sure that all modulefiles written in TCL start with the string
3: #%Module
3: 
3: 
3: 
3: Lmod has detected the following error: The following module(s) are unknown:
3: "rccl/sam-develop"
3: 
3: Please check the spelling or version number. Also try "module spider ..."
3: It is also possible your cache file is out-of-date; it may help to try:
3:   $ module --ignore-cache load "rccl/sam-develop"
3: 
3: Also make sure that all modulefiles written in TCL start with the string
3: #%Module
3: 
3: 
3: 
3: Lmod has detected the following error: The following module(s) are unknown:
3: "aws-ofi-rccl/sam-default"
3: 
3: Please check the spelling or version number. Also try "module spider ..."
3: It is also possible your cache file is out-of-date; it may help to try:
3:   $ module --ignore-cache load "aws-ofi-rccl/sam-default"
3: 
3: Also make sure that all modulefiles written in TCL start with the string
3: #%Module
3: 
3: 
3: 
2: Lmod has detected the following error: The following module(s) are unknown:
2: "suse-repo-deps/sam-default"
2: 
2: Please check the spelling or version number. Also try "module spider ..."
2: It is also possible your cache file is out-of-date; it may help to try:
2:   $ module --ignore-cache load "suse-repo-deps/sam-default"
2: 
2: Also make sure that all modulefiles written in TCL start with the string
2: #%Module
2: 
2: 
2: 
2: Lmod has detected the following error: The following module(s) are unknown:
2: "rocm/sam-5.2.3"
2: 
2: Please check the spelling or version number. Also try "module spider ..."
2: It is also possible your cache file is out-of-date; it may help to try:
2:   $ module --ignore-cache load "rocm/sam-5.2.3"
2: 
2: Also make sure that all modulefiles written in TCL start with the string
2: #%Module
2: 
2: 
2: 
2: Lmod has detected the following error: The following module(s) are unknown:
2: "rccl/sam-develop"
2: 
2: Please check the spelling or version number. Also try "module spider ..."
2: It is also possible your cache file is out-of-date; it may help to try:
2:   $ module --ignore-cache load "rccl/sam-develop"
2: 
2: Also make sure that all modulefiles written in TCL start with the string
2: #%Module
2: 
2: 
2: 
2: Lmod has detected the following error: The following module(s) are unknown:
2: "aws-ofi-rccl/sam-default"
2: 
2: Please check the spelling or version number. Also try "module spider ..."
2: It is also possible your cache file is out-of-date; it may help to try:
2:   $ module --ignore-cache load "aws-ofi-rccl/sam-default"
2: 
2: Also make sure that all modulefiles written in TCL start with the string
2: #%Module
2: 
2: 
2: 
7: Lmod has detected the following error: The following module(s) are unknown:
7: "aws-ofi-rccl/sam-default"
7: 
7: Please check the spelling or version number. Also try "module spider ..."
7: It is also possible your cache file is out-of-date; it may help to try:
7:   $ module --ignore-cache load "aws-ofi-rccl/sam-default"
7: 
7: Also make sure that all modulefiles written in TCL start with the string
7: #%Module
7: 
7: 
7: 
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: 2023-04-24 12:06:52.057328: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2: 2023-04-24 12:06:52.057375: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2: 2023-04-24 12:06:52.057402: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2: 2023-04-24 12:06:52.057411: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2: 2023-04-24 12:06:52.057370: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2: 2023-04-24 12:06:52.057424: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2: 2023-04-24 12:06:52.057458: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2: 2023-04-24 12:06:52.057484: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
2: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
3: 2023-04-24 12:06:52.057756: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
3: 2023-04-24 12:06:52.057808: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
4: 2023-04-24 12:06:52.057915: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
4: 2023-04-24 12:06:52.057932: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
4: 2023-04-24 12:06:52.057970: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
6: 2023-04-24 12:06:52.057958: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
6: 2023-04-24 12:06:52.058003: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
3: 2023-04-24 12:06:52.057799: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
3: 2023-04-24 12:06:52.057851: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
3: 2023-04-24 12:06:52.057860: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
4: 2023-04-24 12:06:52.057973: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
5: 2023-04-24 12:06:52.057848: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
5: 2023-04-24 12:06:52.057860: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
5: 2023-04-24 12:06:52.057883: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
3: 2023-04-24 12:06:52.057868: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
3: 2023-04-24 12:06:52.057880: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
4: 2023-04-24 12:06:52.058000: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
4: 2023-04-24 12:06:52.058002: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
4: 2023-04-24 12:06:52.058016: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
5: 2023-04-24 12:06:52.057904: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
5: 2023-04-24 12:06:52.057922: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
3: 2023-04-24 12:06:52.057908: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
3: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
5: 2023-04-24 12:06:52.057933: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
5: 2023-04-24 12:06:52.057952: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
4: 2023-04-24 12:06:52.058149: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
4: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
5: 2023-04-24 12:06:52.058068: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
5: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
6: 2023-04-24 12:06:52.058035: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
6: 2023-04-24 12:06:52.057999: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
6: 2023-04-24 12:06:52.058064: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
6: 2023-04-24 12:06:52.058076: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
6: 2023-04-24 12:06:52.058047: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
6: 2023-04-24 12:06:52.058167: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
6: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
7: 2023-04-24 12:06:52.058760: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
7: 2023-04-24 12:06:52.058792: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
7: 2023-04-24 12:06:52.058796: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
7: 2023-04-24 12:06:52.058818: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
7: 2023-04-24 12:06:52.058818: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
7: 2023-04-24 12:06:52.058822: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
0: 2023-04-24 12:06:52.058787: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
0: 2023-04-24 12:06:52.058798: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
0: 2023-04-24 12:06:52.058806: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
7: 2023-04-24 12:06:52.058831: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
7: 2023-04-24 12:06:52.058838: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
7: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
0: 2023-04-24 12:06:52.058816: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
0: 2023-04-24 12:06:52.058824: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
0: 2023-04-24 12:06:52.058826: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
0: 2023-04-24 12:06:52.058836: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
0: 2023-04-24 12:06:52.058840: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
0: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
1: 2023-04-24 12:06:52.059041: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
1: 2023-04-24 12:06:52.059057: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
1: 2023-04-24 12:06:52.059049: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
1: 2023-04-24 12:06:52.059056: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
1: 2023-04-24 12:06:52.059046: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
1: 2023-04-24 12:06:52.059073: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
1: 2023-04-24 12:06:52.059070: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
1: 2023-04-24 12:06:52.059101: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
1: To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
3: 2023-04-24 12:07:11.205858: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:11.205894: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:11.205936: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:11.205951: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:11.205980: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:11.205968: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:11.205932: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:11.206219: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:11.206235: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:11.205996: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:11.218934: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
0: 2023-04-24 12:07:11.206199: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
0: 2023-04-24 12:07:11.206234: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:11.218976: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
0: 2023-04-24 12:07:11.206257: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
7: 2023-04-24 12:07:11.206304: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
7: 2023-04-24 12:07:11.206349: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:11.218987: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
0: 2023-04-24 12:07:11.206276: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
0: 2023-04-24 12:07:11.206301: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:11.206265: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:11.206285: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:11.219006: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
4: 2023-04-24 12:07:11.206423: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
4: 2023-04-24 12:07:11.206450: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
5: 2023-04-24 12:07:11.206195: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
5: 2023-04-24 12:07:11.206231: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
6: 2023-04-24 12:07:11.206323: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
6: 2023-04-24 12:07:11.206353: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
2: 2023-04-24 12:07:11.206151: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
2: 2023-04-24 12:07:11.206179: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
0: 2023-04-24 12:07:11.206337: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:11.206280: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:11.206287: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:11.219016: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2: 2023-04-24 12:07:11.206199: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
0: 2023-04-24 12:07:11.206343: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
0: 2023-04-24 12:07:11.206316: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:11.206301: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:11.219055: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
3: 2023-04-24 12:07:11.219059: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
3: 2023-04-24 12:07:11.219061: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2: 2023-04-24 12:07:11.206225: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
2: 2023-04-24 12:07:11.206234: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:11.206307: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
4: 2023-04-24 12:07:11.206467: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
4: 2023-04-24 12:07:11.206514: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
5: 2023-04-24 12:07:11.206256: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
5: 2023-04-24 12:07:11.206274: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
6: 2023-04-24 12:07:11.206378: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
2: 2023-04-24 12:07:11.206248: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
7: 2023-04-24 12:07:11.206433: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
4: 2023-04-24 12:07:11.206535: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
5: 2023-04-24 12:07:11.206306: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
5: 2023-04-24 12:07:11.206320: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
6: 2023-04-24 12:07:11.206403: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
6: 2023-04-24 12:07:11.206424: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
2: 2023-04-24 12:07:11.206259: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
2: 2023-04-24 12:07:11.206267: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
7: 2023-04-24 12:07:11.206429: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
7: 2023-04-24 12:07:11.206431: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
4: 2023-04-24 12:07:11.206550: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
4: 2023-04-24 12:07:11.206551: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
5: 2023-04-24 12:07:11.206332: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
5: 2023-04-24 12:07:11.206325: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
6: 2023-04-24 12:07:11.206446: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
6: 2023-04-24 12:07:11.206441: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
7: 2023-04-24 12:07:11.206378: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
7: 2023-04-24 12:07:11.206450: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
4: 2023-04-24 12:07:11.206579: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
6: 2023-04-24 12:07:11.206450: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
7: 2023-04-24 12:07:11.206470: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:11.219366: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
0: 2023-04-24 12:07:11.219237: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
1: 2023-04-24 12:07:11.219394: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
5: 2023-04-24 12:07:11.219273: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
0: 2023-04-24 12:07:11.219271: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
1: 2023-04-24 12:07:11.219413: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
1: 2023-04-24 12:07:11.219422: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2: 2023-04-24 12:07:11.219307: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2: 2023-04-24 12:07:11.219338: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
0: 2023-04-24 12:07:11.219291: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
0: 2023-04-24 12:07:11.219328: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
0: 2023-04-24 12:07:11.219341: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
1: 2023-04-24 12:07:11.219437: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
4: 2023-04-24 12:07:11.219463: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
5: 2023-04-24 12:07:11.219309: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
5: 2023-04-24 12:07:11.219335: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
0: 2023-04-24 12:07:11.219353: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
0: 2023-04-24 12:07:11.219348: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
0: 2023-04-24 12:07:11.219375: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
1: 2023-04-24 12:07:11.219450: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
1: 2023-04-24 12:07:11.219457: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
4: 2023-04-24 12:07:11.219499: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
5: 2023-04-24 12:07:11.219354: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2: 2023-04-24 12:07:11.219360: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
1: 2023-04-24 12:07:11.219470: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
7: 2023-04-24 12:07:11.219393: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
4: 2023-04-24 12:07:11.219527: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
5: 2023-04-24 12:07:11.219378: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
5: 2023-04-24 12:07:11.219398: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2: 2023-04-24 12:07:11.219390: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
4: 2023-04-24 12:07:11.219551: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
4: 2023-04-24 12:07:11.219560: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
5: 2023-04-24 12:07:11.219420: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
5: 2023-04-24 12:07:11.219431: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
6: 2023-04-24 12:07:11.219469: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
6: 2023-04-24 12:07:11.219500: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2: 2023-04-24 12:07:11.219404: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2: 2023-04-24 12:07:11.219409: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
4: 2023-04-24 12:07:11.219567: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
6: 2023-04-24 12:07:11.219522: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2: 2023-04-24 12:07:11.219427: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2: 2023-04-24 12:07:11.219433: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
4: 2023-04-24 12:07:11.219602: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
6: 2023-04-24 12:07:11.219556: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
4: 2023-04-24 12:07:11.219608: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
6: 2023-04-24 12:07:11.219576: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
6: 2023-04-24 12:07:11.219598: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
6: 2023-04-24 12:07:11.219599: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
6: 2023-04-24 12:07:11.219632: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
7: 2023-04-24 12:07:11.219428: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
7: 2023-04-24 12:07:11.219452: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
7: 2023-04-24 12:07:11.219472: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
7: 2023-04-24 12:07:11.219488: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
7: 2023-04-24 12:07:11.219502: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
7: 2023-04-24 12:07:11.219509: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
7: 2023-04-24 12:07:11.219515: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
1: 2023-04-24 12:07:46.584337: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:46.584357: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:46.584363: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:46.584374: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
0: 2023-04-24 12:07:46.584567: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
0: 2023-04-24 12:07:46.584585: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:46.584380: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:46.584396: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
7: 2023-04-24 12:07:46.584652: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
7: 2023-04-24 12:07:46.584670: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
5: 2023-04-24 12:07:46.584620: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
5: 2023-04-24 12:07:46.584643: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
0: 2023-04-24 12:07:46.584611: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
0: 2023-04-24 12:07:46.584622: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:46.584392: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:46.584395: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
7: 2023-04-24 12:07:46.584693: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
7: 2023-04-24 12:07:46.584705: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
5: 2023-04-24 12:07:46.584653: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
0: 2023-04-24 12:07:46.584636: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
0: 2023-04-24 12:07:46.584639: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:46.585509: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:46.585509: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
7: 2023-04-24 12:07:46.584716: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
5: 2023-04-24 12:07:46.584681: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
5: 2023-04-24 12:07:46.584687: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
0: 2023-04-24 12:07:46.584644: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
0: 2023-04-24 12:07:46.584644: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:46.585511: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:46.585513: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
7: 2023-04-24 12:07:46.584722: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
7: 2023-04-24 12:07:46.584722: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
5: 2023-04-24 12:07:46.584693: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
5: 2023-04-24 12:07:46.584703: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
0: 2023-04-24 12:07:46.585610: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
0: 2023-04-24 12:07:46.585610: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:46.585513: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:46.585516: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
7: 2023-04-24 12:07:46.584727: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
5: 2023-04-24 12:07:46.584712: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
0: 2023-04-24 12:07:46.585613: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
0: 2023-04-24 12:07:46.585617: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:46.585521: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:46.585527: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
1: 2023-04-24 12:07:46.585528: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
0: 2023-04-24 12:07:46.585619: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
0: 2023-04-24 12:07:46.585636: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
1: 2023-04-24 12:07:46.585528: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
0: 2023-04-24 12:07:46.585636: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
0: 2023-04-24 12:07:46.585637: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
0: 2023-04-24 12:07:46.585638: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
1: 2023-04-24 12:07:46.585529: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
1: 2023-04-24 12:07:46.585530: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
1: 2023-04-24 12:07:46.585530: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
0: 2023-04-24 12:07:46.585637: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
0: 2023-04-24 12:07:46.585664: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:46.585534: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
1: 2023-04-24 12:07:46.585536: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
1: 2023-04-24 12:07:46.585549: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
0: 2023-04-24 12:07:46.585688: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
0: 2023-04-24 12:07:46.585674: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
7: 2023-04-24 12:07:46.585862: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
7: 2023-04-24 12:07:46.585863: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
0: 2023-04-24 12:07:46.585690: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
0: 2023-04-24 12:07:46.585702: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
0: 2023-04-24 12:07:46.585713: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
7: 2023-04-24 12:07:46.585867: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
7: 2023-04-24 12:07:46.585869: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
7: 2023-04-24 12:07:46.585869: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
7: 2023-04-24 12:07:46.585884: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
7: 2023-04-24 12:07:46.585885: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
7: 2023-04-24 12:07:46.585886: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
7: 2023-04-24 12:07:46.585886: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
7: 2023-04-24 12:07:46.585885: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
7: 2023-04-24 12:07:46.585966: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
7: 2023-04-24 12:07:46.585977: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
7: 2023-04-24 12:07:46.585986: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
7: 2023-04-24 12:07:46.585994: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
7: 2023-04-24 12:07:46.585987: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
7: 2023-04-24 12:07:46.586006: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
5: 2023-04-24 12:07:46.585906: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
5: 2023-04-24 12:07:46.585905: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
5: 2023-04-24 12:07:46.585908: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
5: 2023-04-24 12:07:46.585908: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
5: 2023-04-24 12:07:46.585909: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
5: 2023-04-24 12:07:46.585911: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
5: 2023-04-24 12:07:46.585931: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
5: 2023-04-24 12:07:46.585931: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
5: 2023-04-24 12:07:46.585931: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
5: 2023-04-24 12:07:46.585932: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
5: 2023-04-24 12:07:46.585932: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
5: 2023-04-24 12:07:46.585932: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
5: 2023-04-24 12:07:46.586043: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
5: 2023-04-24 12:07:46.586049: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
5: 2023-04-24 12:07:46.586063: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
5: 2023-04-24 12:07:46.586066: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
4: 2023-04-24 12:07:46.591246: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
4: 2023-04-24 12:07:46.591309: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
4: 2023-04-24 12:07:46.591310: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
4: 2023-04-24 12:07:46.591321: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
4: 2023-04-24 12:07:46.591339: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
4: 2023-04-24 12:07:46.591361: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
4: 2023-04-24 12:07:46.591366: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
4: 2023-04-24 12:07:46.591445: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
4: 2023-04-24 12:07:46.592570: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
4: 2023-04-24 12:07:46.592571: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
4: 2023-04-24 12:07:46.592572: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
4: 2023-04-24 12:07:46.592575: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
4: 2023-04-24 12:07:46.592578: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
4: 2023-04-24 12:07:46.592578: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
4: 2023-04-24 12:07:46.592580: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
4: 2023-04-24 12:07:46.592581: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
4: 2023-04-24 12:07:46.592603: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
4: 2023-04-24 12:07:46.592603: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
4: 2023-04-24 12:07:46.592603: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
4: 2023-04-24 12:07:46.592605: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
4: 2023-04-24 12:07:46.592605: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
4: 2023-04-24 12:07:46.592605: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
4: 2023-04-24 12:07:46.592606: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
4: 2023-04-24 12:07:46.592604: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
3: 2023-04-24 12:07:46.633248: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:46.633292: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:46.633330: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:46.633338: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:46.633350: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:46.633375: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:46.633613: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:46.633616: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:46.634425: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:46.634443: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
3: 2023-04-24 12:07:46.634452: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:46.634458: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:46.634462: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:46.634465: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:46.634470: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:46.634493: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
3: 2023-04-24 12:07:46.634497: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
3: 2023-04-24 12:07:46.634499: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
3: 2023-04-24 12:07:46.634501: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
3: 2023-04-24 12:07:46.634502: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
3: 2023-04-24 12:07:46.634539: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:46.634544: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
3: 2023-04-24 12:07:46.634558: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
3: 2023-04-24 12:07:46.634561: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2: 2023-04-24 12:07:46.634569: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
2: 2023-04-24 12:07:46.634597: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
2: 2023-04-24 12:07:46.634613: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
2: 2023-04-24 12:07:46.634632: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
2: 2023-04-24 12:07:46.634638: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
2: 2023-04-24 12:07:46.634646: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
2: 2023-04-24 12:07:46.634790: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
2: 2023-04-24 12:07:46.634792: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
6: 2023-04-24 12:07:46.635259: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
6: 2023-04-24 12:07:46.635298: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
6: 2023-04-24 12:07:46.635341: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
6: 2023-04-24 12:07:46.635354: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
6: 2023-04-24 12:07:46.635374: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
6: 2023-04-24 12:07:46.635381: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
6: 2023-04-24 12:07:46.635618: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
6: 2023-04-24 12:07:46.635621: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
2: 2023-04-24 12:07:46.635557: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
2: 2023-04-24 12:07:46.635571: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2: 2023-04-24 12:07:46.635572: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
2: 2023-04-24 12:07:46.635574: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
2: 2023-04-24 12:07:46.635574: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
2: 2023-04-24 12:07:46.635575: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
2: 2023-04-24 12:07:46.635577: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
2: 2023-04-24 12:07:46.635579: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
2: 2023-04-24 12:07:46.635596: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2: 2023-04-24 12:07:46.635596: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2: 2023-04-24 12:07:46.635599: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2: 2023-04-24 12:07:46.635599: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2: 2023-04-24 12:07:46.635600: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2: 2023-04-24 12:07:46.635600: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2: 2023-04-24 12:07:46.635600: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
2: 2023-04-24 12:07:46.635627: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
6: 2023-04-24 12:07:46.636612: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
6: 2023-04-24 12:07:46.636619: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
6: 2023-04-24 12:07:46.636624: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
6: 2023-04-24 12:07:46.636637: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
6: 2023-04-24 12:07:46.636624: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
6: 2023-04-24 12:07:46.636628: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
6: 2023-04-24 12:07:46.636631: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
6: 2023-04-24 12:07:46.636633: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
6: 2023-04-24 12:07:46.636629: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pfs/lustrep2/projappl/project_462000125/samantao-public/apps-rocm-5.2.3/aws-ofi-rccl:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/opt/cray/libfabric/1.15.2.0/lib64
6: 2023-04-24 12:07:46.636660: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
6: 2023-04-24 12:07:46.636662: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
6: 2023-04-24 12:07:46.636667: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
6: 2023-04-24 12:07:46.636669: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
6: 2023-04-24 12:07:46.636672: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
6: 2023-04-24 12:07:46.636673: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
6: 2023-04-24 12:07:46.636674: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
6:                        [--hidden-size HIDDEN_SIZE]
6:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
6:                        [--num-attention-heads NUM_ATTENTION_HEADS]
6:                        [--kv-channels KV_CHANNELS]
6:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
6:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
6:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
6:                        [--layernorm-epsilon LAYERNORM_EPSILON]
6:                        [--sync-tp-duplicated-parameters]
6:                        [--apply-residual-connection-post-layernorm]
6:                        [--embed-layernorm] [--openai-gelu]
6:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
6:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
6:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
6:                        [--kill-switch-path KILL_SWITCH_PATH]
6:                        [--log-level {debug,info,warning,error,critical}]
6:                        [--log-level-replica {debug,info,warning,error,critical}]
6:                        [--attention-dropout ATTENTION_DROPOUT]
6:                        [--hidden-dropout HIDDEN_DROPOUT]
6:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
6:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
6:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
6:                        [--micro-batch-size MICRO_BATCH_SIZE]
6:                        [--batch-size BATCH_SIZE]
6:                        [--global-batch-size GLOBAL_BATCH_SIZE]
6:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
6:                        [--checkpoint-activations]
6:                        [--distribute-checkpointed-activations]
6:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
6:                        [--train-iters TRAIN_ITERS]
6:                        [--train-samples TRAIN_SAMPLES]
6:                        [--train-tokens TRAIN_TOKENS]
6:                        [--log-interval LOG_INTERVAL]
6:                        [--exit-interval EXIT_INTERVAL]
6:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
6:                        [--tensorboard-dir TENSORBOARD_DIR]
6:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
6:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
6:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
6:                        [--use-bnb-optimizer]
6:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
6:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
6:                        [--eval-only EVAL_ONLY]
0: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
0:                        [--hidden-size HIDDEN_SIZE]
0:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
0:                        [--num-attention-heads NUM_ATTENTION_HEADS]
0:                        [--kv-channels KV_CHANNELS]
0:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
0:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
0:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
0:                        [--layernorm-epsilon LAYERNORM_EPSILON]
0:                        [--sync-tp-duplicated-parameters]
0:                        [--apply-residual-connection-post-layernorm]
0:                        [--embed-layernorm] [--openai-gelu]
0:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
0:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
6:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
6:                        [--inference]
6:                        [--abort-on-unmet-fused-kernel-constraints]
6:                        [--pp-partition-method PP_PARTITION_METHOD]
6:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
6:                        [--init-method-xavier-uniform] [--lr LR]
6:                        [--lr-decay-style {constant,linear,cosine}]
6:                        [--lr-decay-iters LR_DECAY_ITERS]
6:                        [--lr-decay-samples LR_DECAY_SAMPLES]
6:                        [--lr-decay-tokens LR_DECAY_TOKENS]
6:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
6:                        [--lr-warmup-iters LR_WARMUP_ITERS]
6:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
6:                        [--warmup WARMUP] [--min-lr MIN_LR]
6:                        [--override-lr-scheduler]
6:                        [--use-checkpoint-lr-scheduler]
0:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
0:                        [--kill-switch-path KILL_SWITCH_PATH]
0:                        [--log-level {debug,info,warning,error,critical}]
0:                        [--log-level-replica {debug,info,warning,error,critical}]
0:                        [--attention-dropout ATTENTION_DROPOUT]
0:                        [--hidden-dropout HIDDEN_DROPOUT]
0:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
0:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
0:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
0:                        [--micro-batch-size MICRO_BATCH_SIZE]
0:                        [--batch-size BATCH_SIZE]
0:                        [--global-batch-size GLOBAL_BATCH_SIZE]
0:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
0:                        [--checkpoint-activations]
0:                        [--distribute-checkpointed-activations]
6:                        [--universal-checkpoint] [--save SAVE]
6:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
0:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
0:                        [--train-iters TRAIN_ITERS]
0:                        [--train-samples TRAIN_SAMPLES]
0:                        [--train-tokens TRAIN_TOKENS]
0:                        [--log-interval LOG_INTERVAL]
0:                        [--exit-interval EXIT_INTERVAL]
0:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
0:                        [--tensorboard-dir TENSORBOARD_DIR]
0:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
0:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
0:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
0:                        [--use-bnb-optimizer]
0:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
0:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
0:                        [--eval-only EVAL_ONLY]
6:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
6:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
6:                        [--loss-scale LOSS_SCALE]
6:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
6:                        [--min-loss-scale MIN_LOSS_SCALE]
6:                        [--loss-scale-window LOSS_SCALE_WINDOW]
6:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
6:                        [--no-query-key-layer-scaling]
6:                        [--attention-softmax-in-fp32]
6:                        [--accumulate-allreduce-grads-in-fp32]
6:                        [--fp16-lm-cross-entropy]
6:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
6:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
6:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
6:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
0:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
0:                        [--inference]
0:                        [--abort-on-unmet-fused-kernel-constraints]
0:                        [--pp-partition-method PP_PARTITION_METHOD]
0:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
0:                        [--init-method-xavier-uniform] [--lr LR]
0:                        [--lr-decay-style {constant,linear,cosine}]
0:                        [--lr-decay-iters LR_DECAY_ITERS]
0:                        [--lr-decay-samples LR_DECAY_SAMPLES]
0:                        [--lr-decay-tokens LR_DECAY_TOKENS]
0:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
0:                        [--lr-warmup-iters LR_WARMUP_ITERS]
0:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
0:                        [--warmup WARMUP] [--min-lr MIN_LR]
0:                        [--override-lr-scheduler]
0:                        [--use-checkpoint-lr-scheduler]
6:                        [--distributed-backend {nccl,gloo}]
6:                        [--DDP-impl {local,torch}]
6:                        [--use-contiguous-buffers-in-ddp]
6:                        [--no-scatter-gather-tensors-in-pipeline]
6:                        [--local_rank LOCAL_RANK]
6:                        [--lazy-mpu-init LAZY_MPU_INIT]
6:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
6:                        [--eval-interval EVAL_INTERVAL]
6:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
6:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
6:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
6:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
6:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
6:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
0:                        [--universal-checkpoint] [--save SAVE]
0:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
6:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
6:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
6:                        [--merge-file MERGE_FILE]
6:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
6:                        [--seq-length SEQ_LENGTH]
6:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
6:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
6:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
6:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
6:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
6:                        [--num-workers NUM_WORKERS]
6:                        [--valid-num-workers VALID_NUM_WORKERS]
6:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
6:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
0:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
0:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
0:                        [--loss-scale LOSS_SCALE]
0:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
0:                        [--min-loss-scale MIN_LOSS_SCALE]
0:                        [--loss-scale-window LOSS_SCALE_WINDOW]
0:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
0:                        [--no-query-key-layer-scaling]
0:                        [--attention-softmax-in-fp32]
0:                        [--accumulate-allreduce-grads-in-fp32]
0:                        [--fp16-lm-cross-entropy]
0:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
0:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
0:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
0:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
6:                        [--data-impl {lazy,cached,mmap,infer}]
6:                        [--reset-position-ids] [--reset-attention-mask]
6:                        [--eod-mask-loss] [--loss-on-targets-only]
6:                        [--reweight-loss-based-on-position-frequency]
6:                        [--noise-density NOISE_DENSITY]
6:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
6:                        [--adlr-autoresume]
6:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
6:                        [--ict-head-size ICT_HEAD_SIZE]
6:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
6:                        [--biencoder-shared-query-context-model]
6:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
6:                        [--titles-data-path TITLES_DATA_PATH]
6:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
6:                        [--use-one-sent-docs]
6:                        [--evidence-data-path EVIDENCE_DATA_PATH]
0:                        [--distributed-backend {nccl,gloo}]
0:                        [--DDP-impl {local,torch}]
0:                        [--use-contiguous-buffers-in-ddp]
0:                        [--no-scatter-gather-tensors-in-pipeline]
0:                        [--local_rank LOCAL_RANK]
0:                        [--lazy-mpu-init LAZY_MPU_INIT]
0:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
0:                        [--eval-interval EVAL_INTERVAL]
0:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
0:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
0:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
0:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
0:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
0:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
6:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
6:                        [--retriever-score-scaling]
6:                        [--block-data-path BLOCK_DATA_PATH]
6:                        [--embedding-path EMBEDDING_PATH]
6:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
6:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
6:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
6:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
6:                        [--log-params-norm] [--log-num-zeros-in-grad]
6:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
6:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
6:                        [--log-timers-to-tensorboard]
6:                        [--log-batch-size-to-tensorboard]
6:                        [--no-log-learnig-rate-to-tensorboard]
0:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
0:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
0:                        [--merge-file MERGE_FILE]
0:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
0:                        [--seq-length SEQ_LENGTH]
0:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
0:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
0:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
0:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
0:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
0:                        [--num-workers NUM_WORKERS]
0:                        [--valid-num-workers VALID_NUM_WORKERS]
0:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
0:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
6:                        [--no-log-loss-scale-to-tensorboard]
6:                        [--log-validation-ppl-to-tensorboard]
6:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
6:                        [--zero-contigious-gradients]
6:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
6:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
6:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
6:                        [--scattered-embeddings] [--split-transformers]
6:                        [--memory-centric-tiled-linear]
6:                        [--tile-factor TILE_FACTOR]
6:                        [--deepspeed-activation-checkpointing]
6:                        [--partition-activations] [--contigious-checkpointing]
6:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
6:                        [--profile-backward] [--deepspeed]
6:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
0:                        [--data-impl {lazy,cached,mmap,infer}]
0:                        [--reset-position-ids] [--reset-attention-mask]
0:                        [--eod-mask-loss] [--loss-on-targets-only]
0:                        [--reweight-loss-based-on-position-frequency]
0:                        [--noise-density NOISE_DENSITY]
0:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
0:                        [--adlr-autoresume]
0:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
0:                        [--ict-head-size ICT_HEAD_SIZE]
0:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
0:                        [--biencoder-shared-query-context-model]
0:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
0:                        [--titles-data-path TITLES_DATA_PATH]
0:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
0:                        [--use-one-sent-docs]
0:                        [--evidence-data-path EVIDENCE_DATA_PATH]
6:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
6: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
6: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
6:                        [--hidden-size HIDDEN_SIZE]
6:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
6:                        [--num-attention-heads NUM_ATTENTION_HEADS]
6:                        [--kv-channels KV_CHANNELS]
6:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
6:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
6:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
6:                        [--layernorm-epsilon LAYERNORM_EPSILON]
6:                        [--sync-tp-duplicated-parameters]
6:                        [--apply-residual-connection-post-layernorm]
6:                        [--embed-layernorm] [--openai-gelu]
6:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
0:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
0:                        [--retriever-score-scaling]
0:                        [--block-data-path BLOCK_DATA_PATH]
0:                        [--embedding-path EMBEDDING_PATH]
0:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
0:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
0:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
0:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
0:                        [--log-params-norm] [--log-num-zeros-in-grad]
0:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
0:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
0:                        [--log-timers-to-tensorboard]
0:                        [--log-batch-size-to-tensorboard]
0:                        [--no-log-learnig-rate-to-tensorboard]
6:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
6:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
6:                        [--kill-switch-path KILL_SWITCH_PATH]
6:                        [--log-level {debug,info,warning,error,critical}]
6:                        [--log-level-replica {debug,info,warning,error,critical}]
6:                        [--attention-dropout ATTENTION_DROPOUT]
6:                        [--hidden-dropout HIDDEN_DROPOUT]
6:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
6:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
6:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
6:                        [--micro-batch-size MICRO_BATCH_SIZE]
6:                        [--batch-size BATCH_SIZE]
6:                        [--global-batch-size GLOBAL_BATCH_SIZE]
6:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
0:                        [--no-log-loss-scale-to-tensorboard]
0:                        [--log-validation-ppl-to-tensorboard]
0:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
0:                        [--zero-contigious-gradients]
0:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
0:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
0:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
0:                        [--scattered-embeddings] [--split-transformers]
0:                        [--memory-centric-tiled-linear]
0:                        [--tile-factor TILE_FACTOR]
0:                        [--deepspeed-activation-checkpointing]
0:                        [--partition-activations] [--contigious-checkpointing]
0:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
0:                        [--profile-backward] [--deepspeed]
0:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
6:                        [--checkpoint-activations]
6:                        [--distribute-checkpointed-activations]
0:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
0: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
6:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
6:                        [--train-iters TRAIN_ITERS]
6:                        [--train-samples TRAIN_SAMPLES]
6:                        [--train-tokens TRAIN_TOKENS]
6:                        [--log-interval LOG_INTERVAL]
6:                        [--exit-interval EXIT_INTERVAL]
6:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
6:                        [--tensorboard-dir TENSORBOARD_DIR]
6:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
6:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
6:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
6:                        [--use-bnb-optimizer]
6:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
6:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
6:                        [--eval-only EVAL_ONLY]
6:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
6:                        [--inference]
6:                        [--abort-on-unmet-fused-kernel-constraints]
6:                        [--pp-partition-method PP_PARTITION_METHOD]
6:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
6:                        [--init-method-xavier-uniform] [--lr LR]
6:                        [--lr-decay-style {constant,linear,cosine}]
6:                        [--lr-decay-iters LR_DECAY_ITERS]
6:                        [--lr-decay-samples LR_DECAY_SAMPLES]
6:                        [--lr-decay-tokens LR_DECAY_TOKENS]
6:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
6:                        [--lr-warmup-iters LR_WARMUP_ITERS]
6:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
6:                        [--warmup WARMUP] [--min-lr MIN_LR]
6:                        [--override-lr-scheduler]
6:                        [--use-checkpoint-lr-scheduler]
6:                        [--universal-checkpoint] [--save SAVE]
6:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
6:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
6:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
6:                        [--loss-scale LOSS_SCALE]
6:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
6:                        [--min-loss-scale MIN_LOSS_SCALE]
6:                        [--loss-scale-window LOSS_SCALE_WINDOW]
6:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
6:                        [--no-query-key-layer-scaling]
6:                        [--attention-softmax-in-fp32]
6:                        [--accumulate-allreduce-grads-in-fp32]
6:                        [--fp16-lm-cross-entropy]
6:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
6:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
6:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
6:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
6:                        [--distributed-backend {nccl,gloo}]
6:                        [--DDP-impl {local,torch}]
6:                        [--use-contiguous-buffers-in-ddp]
6:                        [--no-scatter-gather-tensors-in-pipeline]
6:                        [--local_rank LOCAL_RANK]
6:                        [--lazy-mpu-init LAZY_MPU_INIT]
6:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
6:                        [--eval-interval EVAL_INTERVAL]
6:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
6:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
6:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
6:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
6:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
6:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
6:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
6:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
6:                        [--merge-file MERGE_FILE]
6:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
6:                        [--seq-length SEQ_LENGTH]
6:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
6:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
6:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
6:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
6:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
6:                        [--num-workers NUM_WORKERS]
6:                        [--valid-num-workers VALID_NUM_WORKERS]
6:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
6:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
6:                        [--data-impl {lazy,cached,mmap,infer}]
6:                        [--reset-position-ids] [--reset-attention-mask]
6:                        [--eod-mask-loss] [--loss-on-targets-only]
6:                        [--reweight-loss-based-on-position-frequency]
6:                        [--noise-density NOISE_DENSITY]
6:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
6:                        [--adlr-autoresume]
6:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
6:                        [--ict-head-size ICT_HEAD_SIZE]
6:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
6:                        [--biencoder-shared-query-context-model]
6:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
6:                        [--titles-data-path TITLES_DATA_PATH]
6:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
6:                        [--use-one-sent-docs]
6:                        [--evidence-data-path EVIDENCE_DATA_PATH]
6:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
6:                        [--retriever-score-scaling]
6:                        [--block-data-path BLOCK_DATA_PATH]
6:                        [--embedding-path EMBEDDING_PATH]
6:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
6:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
6:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
6:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
6:                        [--log-params-norm] [--log-num-zeros-in-grad]
6:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
6:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
6:                        [--log-timers-to-tensorboard]
6:                        [--log-batch-size-to-tensorboard]
6:                        [--no-log-learnig-rate-to-tensorboard]
6:                        [--no-log-loss-scale-to-tensorboard]
6:                        [--log-validation-ppl-to-tensorboard]
6:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
6:                        [--zero-contigious-gradients]
6:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
6:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
6:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
6:                        [--scattered-embeddings] [--split-transformers]
6:                        [--memory-centric-tiled-linear]
6:                        [--tile-factor TILE_FACTOR]
6:                        [--deepspeed-activation-checkpointing]
6:                        [--partition-activations] [--contigious-checkpointing]
6:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
6:                        [--profile-backward] [--deepspeed]
6:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
6:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
6: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
7:                        [--hidden-size HIDDEN_SIZE]
7:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
7:                        [--num-attention-heads NUM_ATTENTION_HEADS]
7:                        [--kv-channels KV_CHANNELS]
7:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
7:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
7:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
7:                        [--layernorm-epsilon LAYERNORM_EPSILON]
7:                        [--sync-tp-duplicated-parameters]
7:                        [--apply-residual-connection-post-layernorm]
7:                        [--embed-layernorm] [--openai-gelu]
7:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
7:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
7:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
7:                        [--kill-switch-path KILL_SWITCH_PATH]
7:                        [--log-level {debug,info,warning,error,critical}]
7:                        [--log-level-replica {debug,info,warning,error,critical}]
7:                        [--attention-dropout ATTENTION_DROPOUT]
7:                        [--hidden-dropout HIDDEN_DROPOUT]
7:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
7:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
7:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
7:                        [--micro-batch-size MICRO_BATCH_SIZE]
7:                        [--batch-size BATCH_SIZE]
7:                        [--global-batch-size GLOBAL_BATCH_SIZE]
7:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
7:                        [--checkpoint-activations]
7:                        [--distribute-checkpointed-activations]
7:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
7:                        [--train-iters TRAIN_ITERS]
7:                        [--train-samples TRAIN_SAMPLES]
7:                        [--train-tokens TRAIN_TOKENS]
7:                        [--log-interval LOG_INTERVAL]
7:                        [--exit-interval EXIT_INTERVAL]
7:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
7:                        [--tensorboard-dir TENSORBOARD_DIR]
7:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
7:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
7:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
7:                        [--use-bnb-optimizer]
7:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
7:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
7:                        [--eval-only EVAL_ONLY]
7:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
7:                        [--inference]
7:                        [--abort-on-unmet-fused-kernel-constraints]
7:                        [--pp-partition-method PP_PARTITION_METHOD]
7:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
7:                        [--init-method-xavier-uniform] [--lr LR]
7:                        [--lr-decay-style {constant,linear,cosine}]
7:                        [--lr-decay-iters LR_DECAY_ITERS]
7:                        [--lr-decay-samples LR_DECAY_SAMPLES]
7:                        [--lr-decay-tokens LR_DECAY_TOKENS]
7:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
7:                        [--lr-warmup-iters LR_WARMUP_ITERS]
7:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
7:                        [--warmup WARMUP] [--min-lr MIN_LR]
7:                        [--override-lr-scheduler]
7:                        [--use-checkpoint-lr-scheduler]
7:                        [--universal-checkpoint] [--save SAVE]
7:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
7:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
7:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
7:                        [--loss-scale LOSS_SCALE]
7:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
7:                        [--min-loss-scale MIN_LOSS_SCALE]
7:                        [--loss-scale-window LOSS_SCALE_WINDOW]
7:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
7:                        [--no-query-key-layer-scaling]
7:                        [--attention-softmax-in-fp32]
7:                        [--accumulate-allreduce-grads-in-fp32]
7:                        [--fp16-lm-cross-entropy]
7:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
7:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
7:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
7:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
7:                        [--distributed-backend {nccl,gloo}]
7:                        [--DDP-impl {local,torch}]
7:                        [--use-contiguous-buffers-in-ddp]
7:                        [--no-scatter-gather-tensors-in-pipeline]
7:                        [--local_rank LOCAL_RANK]
7:                        [--lazy-mpu-init LAZY_MPU_INIT]
7:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
7:                        [--eval-interval EVAL_INTERVAL]
7:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
7:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
7:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
7:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
7:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
7:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
7:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
7:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
7:                        [--merge-file MERGE_FILE]
7:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
7:                        [--seq-length SEQ_LENGTH]
7:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
7:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
7:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
7:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
7:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
7:                        [--num-workers NUM_WORKERS]
7:                        [--valid-num-workers VALID_NUM_WORKERS]
7:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
7:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
7:                        [--data-impl {lazy,cached,mmap,infer}]
7:                        [--reset-position-ids] [--reset-attention-mask]
7:                        [--eod-mask-loss] [--loss-on-targets-only]
7:                        [--reweight-loss-based-on-position-frequency]
7:                        [--noise-density NOISE_DENSITY]
7:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
7:                        [--adlr-autoresume]
7:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
7:                        [--ict-head-size ICT_HEAD_SIZE]
7:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
7:                        [--biencoder-shared-query-context-model]
7:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
7:                        [--titles-data-path TITLES_DATA_PATH]
7:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
7:                        [--use-one-sent-docs]
7:                        [--evidence-data-path EVIDENCE_DATA_PATH]
7:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
7:                        [--retriever-score-scaling]
7:                        [--block-data-path BLOCK_DATA_PATH]
7:                        [--embedding-path EMBEDDING_PATH]
7:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
7:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
7:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
7:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
7:                        [--log-params-norm] [--log-num-zeros-in-grad]
7:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
7:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
7:                        [--log-timers-to-tensorboard]
7:                        [--log-batch-size-to-tensorboard]
7:                        [--no-log-learnig-rate-to-tensorboard]
7:                        [--no-log-loss-scale-to-tensorboard]
7:                        [--log-validation-ppl-to-tensorboard]
7:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
7:                        [--zero-contigious-gradients]
7:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
7:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
7:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
7:                        [--scattered-embeddings] [--split-transformers]
7:                        [--memory-centric-tiled-linear]
7:                        [--tile-factor TILE_FACTOR]
7:                        [--deepspeed-activation-checkpointing]
7:                        [--partition-activations] [--contigious-checkpointing]
7:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
7:                        [--profile-backward] [--deepspeed]
7:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
7:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
7: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
6:                        [--hidden-size HIDDEN_SIZE]
6:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
6:                        [--num-attention-heads NUM_ATTENTION_HEADS]
6:                        [--kv-channels KV_CHANNELS]
6:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
6:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
6:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
6:                        [--layernorm-epsilon LAYERNORM_EPSILON]
6:                        [--sync-tp-duplicated-parameters]
6:                        [--apply-residual-connection-post-layernorm]
6:                        [--embed-layernorm] [--openai-gelu]
6:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
6:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
6:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
6:                        [--kill-switch-path KILL_SWITCH_PATH]
6:                        [--log-level {debug,info,warning,error,critical}]
6:                        [--log-level-replica {debug,info,warning,error,critical}]
6:                        [--attention-dropout ATTENTION_DROPOUT]
6:                        [--hidden-dropout HIDDEN_DROPOUT]
6:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
6:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
6:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
6:                        [--micro-batch-size MICRO_BATCH_SIZE]
6:                        [--batch-size BATCH_SIZE]
6:                        [--global-batch-size GLOBAL_BATCH_SIZE]
6:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
6:                        [--checkpoint-activations]
6:                        [--distribute-checkpointed-activations]
6:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
6:                        [--train-iters TRAIN_ITERS]
6:                        [--train-samples TRAIN_SAMPLES]
6:                        [--train-tokens TRAIN_TOKENS]
6:                        [--log-interval LOG_INTERVAL]
6:                        [--exit-interval EXIT_INTERVAL]
6:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
6:                        [--tensorboard-dir TENSORBOARD_DIR]
6:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
6:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
6:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
6:                        [--use-bnb-optimizer]
6:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
6:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
6:                        [--eval-only EVAL_ONLY]
6:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
6:                        [--inference]
6:                        [--abort-on-unmet-fused-kernel-constraints]
6:                        [--pp-partition-method PP_PARTITION_METHOD]
6:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
6:                        [--init-method-xavier-uniform] [--lr LR]
6:                        [--lr-decay-style {constant,linear,cosine}]
6:                        [--lr-decay-iters LR_DECAY_ITERS]
6:                        [--lr-decay-samples LR_DECAY_SAMPLES]
6:                        [--lr-decay-tokens LR_DECAY_TOKENS]
6:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
6:                        [--lr-warmup-iters LR_WARMUP_ITERS]
6:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
6:                        [--warmup WARMUP] [--min-lr MIN_LR]
6:                        [--override-lr-scheduler]
6:                        [--use-checkpoint-lr-scheduler]
6:                        [--universal-checkpoint] [--save SAVE]
6:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
6:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
6:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
6:                        [--loss-scale LOSS_SCALE]
6:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
6:                        [--min-loss-scale MIN_LOSS_SCALE]
6:                        [--loss-scale-window LOSS_SCALE_WINDOW]
6:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
6:                        [--no-query-key-layer-scaling]
6:                        [--attention-softmax-in-fp32]
6:                        [--accumulate-allreduce-grads-in-fp32]
6:                        [--fp16-lm-cross-entropy]
6:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
6:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
6:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
6:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
6:                        [--distributed-backend {nccl,gloo}]
6:                        [--DDP-impl {local,torch}]
6:                        [--use-contiguous-buffers-in-ddp]
6:                        [--no-scatter-gather-tensors-in-pipeline]
6:                        [--local_rank LOCAL_RANK]
6:                        [--lazy-mpu-init LAZY_MPU_INIT]
6:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
6:                        [--eval-interval EVAL_INTERVAL]
6:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
6:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
6:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
6:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
6:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
6:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
6:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
6:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
6:                        [--merge-file MERGE_FILE]
6:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
6:                        [--seq-length SEQ_LENGTH]
6:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
6:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
6:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
6:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
6:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
6:                        [--num-workers NUM_WORKERS]
6:                        [--valid-num-workers VALID_NUM_WORKERS]
6:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
6:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
6:                        [--data-impl {lazy,cached,mmap,infer}]
6:                        [--reset-position-ids] [--reset-attention-mask]
6:                        [--eod-mask-loss] [--loss-on-targets-only]
6:                        [--reweight-loss-based-on-position-frequency]
6:                        [--noise-density NOISE_DENSITY]
6:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
6:                        [--adlr-autoresume]
6:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
6:                        [--ict-head-size ICT_HEAD_SIZE]
6:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
6:                        [--biencoder-shared-query-context-model]
6:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
6:                        [--titles-data-path TITLES_DATA_PATH]
6:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
6:                        [--use-one-sent-docs]
6:                        [--evidence-data-path EVIDENCE_DATA_PATH]
6:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
6:                        [--retriever-score-scaling]
6:                        [--block-data-path BLOCK_DATA_PATH]
6:                        [--embedding-path EMBEDDING_PATH]
6:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
6:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
6:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
6:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
6:                        [--log-params-norm] [--log-num-zeros-in-grad]
6:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
6:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
6:                        [--log-timers-to-tensorboard]
6:                        [--log-batch-size-to-tensorboard]
6:                        [--no-log-learnig-rate-to-tensorboard]
6:                        [--no-log-loss-scale-to-tensorboard]
6:                        [--log-validation-ppl-to-tensorboard]
6:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
6:                        [--zero-contigious-gradients]
6:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
6:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
6:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
6:                        [--scattered-embeddings] [--split-transformers]
6:                        [--memory-centric-tiled-linear]
6:                        [--tile-factor TILE_FACTOR]
6:                        [--deepspeed-activation-checkpointing]
6:                        [--partition-activations] [--contigious-checkpointing]
6:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
6:                        [--profile-backward] [--deepspeed]
6:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
6:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
6: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
6: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
6:                        [--hidden-size HIDDEN_SIZE]
6:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
6:                        [--num-attention-heads NUM_ATTENTION_HEADS]
6:                        [--kv-channels KV_CHANNELS]
6:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
6:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
6:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
6:                        [--layernorm-epsilon LAYERNORM_EPSILON]
6:                        [--sync-tp-duplicated-parameters]
6:                        [--apply-residual-connection-post-layernorm]
6:                        [--embed-layernorm] [--openai-gelu]
6:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
6:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
6:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
6:                        [--kill-switch-path KILL_SWITCH_PATH]
6:                        [--log-level {debug,info,warning,error,critical}]
6:                        [--log-level-replica {debug,info,warning,error,critical}]
6:                        [--attention-dropout ATTENTION_DROPOUT]
6:                        [--hidden-dropout HIDDEN_DROPOUT]
6:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
6:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
6:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
6:                        [--micro-batch-size MICRO_BATCH_SIZE]
6:                        [--batch-size BATCH_SIZE]
6:                        [--global-batch-size GLOBAL_BATCH_SIZE]
6:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6:                        [--checkpoint-activations]
6:                        [--distribute-checkpointed-activations]
6:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
6:                        [--train-iters TRAIN_ITERS]
6:                        [--train-samples TRAIN_SAMPLES]
6:                        [--train-tokens TRAIN_TOKENS]
6:                        [--log-interval LOG_INTERVAL]
6:                        [--exit-interval EXIT_INTERVAL]
6:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
6:                        [--tensorboard-dir TENSORBOARD_DIR]
6:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
6:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
6:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
6:                        [--use-bnb-optimizer]
6:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
6:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
6:                        [--eval-only EVAL_ONLY]
6:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
6:                        [--inference]
6:                        [--abort-on-unmet-fused-kernel-constraints]
6:                        [--pp-partition-method PP_PARTITION_METHOD]
6:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
6:                        [--init-method-xavier-uniform] [--lr LR]
6:                        [--lr-decay-style {constant,linear,cosine}]
6:                        [--lr-decay-iters LR_DECAY_ITERS]
6:                        [--lr-decay-samples LR_DECAY_SAMPLES]
6:                        [--lr-decay-tokens LR_DECAY_TOKENS]
6:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
6:                        [--lr-warmup-iters LR_WARMUP_ITERS]
6:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
6:                        [--warmup WARMUP] [--min-lr MIN_LR]
6:                        [--override-lr-scheduler]
6:                        [--use-checkpoint-lr-scheduler]
6:                        [--universal-checkpoint] [--save SAVE]
6:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
6:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
6:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
6:                        [--loss-scale LOSS_SCALE]
6:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
6:                        [--min-loss-scale MIN_LOSS_SCALE]
6:                        [--loss-scale-window LOSS_SCALE_WINDOW]
6:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
6:                        [--no-query-key-layer-scaling]
6:                        [--attention-softmax-in-fp32]
6:                        [--accumulate-allreduce-grads-in-fp32]
6:                        [--fp16-lm-cross-entropy]
6:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
6:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
6:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
6:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
6:                        [--distributed-backend {nccl,gloo}]
6:                        [--DDP-impl {local,torch}]
6:                        [--use-contiguous-buffers-in-ddp]
6:                        [--no-scatter-gather-tensors-in-pipeline]
6:                        [--local_rank LOCAL_RANK]
6:                        [--lazy-mpu-init LAZY_MPU_INIT]
6:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
6:                        [--eval-interval EVAL_INTERVAL]
6:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
6:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
6:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
6:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
6:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
6:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
6:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
6:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
6:                        [--merge-file MERGE_FILE]
6:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
6:                        [--seq-length SEQ_LENGTH]
6:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
6:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
6:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
6:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
6:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
6:                        [--num-workers NUM_WORKERS]
6:                        [--valid-num-workers VALID_NUM_WORKERS]
6:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
6:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
6:                        [--data-impl {lazy,cached,mmap,infer}]
6:                        [--reset-position-ids] [--reset-attention-mask]
6:                        [--eod-mask-loss] [--loss-on-targets-only]
6:                        [--reweight-loss-based-on-position-frequency]
6:                        [--noise-density NOISE_DENSITY]
6:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
6:                        [--adlr-autoresume]
6:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
6:                        [--ict-head-size ICT_HEAD_SIZE]
6:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
6:                        [--biencoder-shared-query-context-model]
6:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
6:                        [--titles-data-path TITLES_DATA_PATH]
6:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
6:                        [--use-one-sent-docs]
6:                        [--evidence-data-path EVIDENCE_DATA_PATH]
6:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
6:                        [--retriever-score-scaling]
6:                        [--block-data-path BLOCK_DATA_PATH]
6:                        [--embedding-path EMBEDDING_PATH]
6:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
6:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
6:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
6:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
6:                        [--log-params-norm] [--log-num-zeros-in-grad]
6:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
6:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
6:                        [--log-timers-to-tensorboard]
6:                        [--log-batch-size-to-tensorboard]
6:                        [--no-log-learnig-rate-to-tensorboard]
6:                        [--no-log-loss-scale-to-tensorboard]
6:                        [--log-validation-ppl-to-tensorboard]
6:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
6:                        [--zero-contigious-gradients]
6:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
6:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
6:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
6:                        [--scattered-embeddings] [--split-transformers]
6:                        [--memory-centric-tiled-linear]
6:                        [--tile-factor TILE_FACTOR]
6:                        [--deepspeed-activation-checkpointing]
6:                        [--partition-activations] [--contigious-checkpointing]
6:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
6:                        [--profile-backward] [--deepspeed]
6:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
6:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
6: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
6: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
6:                        [--hidden-size HIDDEN_SIZE]
6:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
6:                        [--num-attention-heads NUM_ATTENTION_HEADS]
6:                        [--kv-channels KV_CHANNELS]
6:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
6:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
6:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
6:                        [--layernorm-epsilon LAYERNORM_EPSILON]
6:                        [--sync-tp-duplicated-parameters]
6:                        [--apply-residual-connection-post-layernorm]
6:                        [--embed-layernorm] [--openai-gelu]
6:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
6:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
6:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
6:                        [--kill-switch-path KILL_SWITCH_PATH]
6:                        [--log-level {debug,info,warning,error,critical}]
6:                        [--log-level-replica {debug,info,warning,error,critical}]
6:                        [--attention-dropout ATTENTION_DROPOUT]
6:                        [--hidden-dropout HIDDEN_DROPOUT]
6:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
6:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
6:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
6:                        [--micro-batch-size MICRO_BATCH_SIZE]
6:                        [--batch-size BATCH_SIZE]
6:                        [--global-batch-size GLOBAL_BATCH_SIZE]
6:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
6:                        [--checkpoint-activations]
6:                        [--distribute-checkpointed-activations]
6:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
6:                        [--train-iters TRAIN_ITERS]
6:                        [--train-samples TRAIN_SAMPLES]
6:                        [--train-tokens TRAIN_TOKENS]
6:                        [--log-interval LOG_INTERVAL]
6:                        [--exit-interval EXIT_INTERVAL]
6:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
6:                        [--tensorboard-dir TENSORBOARD_DIR]
6:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
6:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
6:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
6:                        [--use-bnb-optimizer]
6:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
6:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
6:                        [--eval-only EVAL_ONLY]
6:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
6:                        [--inference]
6:                        [--abort-on-unmet-fused-kernel-constraints]
6:                        [--pp-partition-method PP_PARTITION_METHOD]
6:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
6:                        [--init-method-xavier-uniform] [--lr LR]
6:                        [--lr-decay-style {constant,linear,cosine}]
6:                        [--lr-decay-iters LR_DECAY_ITERS]
6:                        [--lr-decay-samples LR_DECAY_SAMPLES]
6:                        [--lr-decay-tokens LR_DECAY_TOKENS]
6:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
6:                        [--lr-warmup-iters LR_WARMUP_ITERS]
6:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
6:                        [--warmup WARMUP] [--min-lr MIN_LR]
6:                        [--override-lr-scheduler]
6:                        [--use-checkpoint-lr-scheduler]
6:                        [--universal-checkpoint] [--save SAVE]
6:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
6:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
6:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
6:                        [--loss-scale LOSS_SCALE]
6:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
6:                        [--min-loss-scale MIN_LOSS_SCALE]
6:                        [--loss-scale-window LOSS_SCALE_WINDOW]
6:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
6:                        [--no-query-key-layer-scaling]
6:                        [--attention-softmax-in-fp32]
6:                        [--accumulate-allreduce-grads-in-fp32]
6:                        [--fp16-lm-cross-entropy]
6:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
6:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
6:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
6:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
6:                        [--distributed-backend {nccl,gloo}]
6:                        [--DDP-impl {local,torch}]
6:                        [--use-contiguous-buffers-in-ddp]
6:                        [--no-scatter-gather-tensors-in-pipeline]
6:                        [--local_rank LOCAL_RANK]
6:                        [--lazy-mpu-init LAZY_MPU_INIT]
6:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
6:                        [--eval-interval EVAL_INTERVAL]
6:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
6:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
6:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
6:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
6:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
6:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
6:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
6:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
6:                        [--merge-file MERGE_FILE]
6:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
6:                        [--seq-length SEQ_LENGTH]
6:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
6:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
6:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
6:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
6:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
6:                        [--num-workers NUM_WORKERS]
6:                        [--valid-num-workers VALID_NUM_WORKERS]
6:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
6:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
6:                        [--data-impl {lazy,cached,mmap,infer}]
6:                        [--reset-position-ids] [--reset-attention-mask]
6:                        [--eod-mask-loss] [--loss-on-targets-only]
6:                        [--reweight-loss-based-on-position-frequency]
6:                        [--noise-density NOISE_DENSITY]
6:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
6:                        [--adlr-autoresume]
6:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
6:                        [--ict-head-size ICT_HEAD_SIZE]
6:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
6:                        [--biencoder-shared-query-context-model]
6:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
6:                        [--titles-data-path TITLES_DATA_PATH]
6:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
6:                        [--use-one-sent-docs]
6:                        [--evidence-data-path EVIDENCE_DATA_PATH]
6:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
6:                        [--retriever-score-scaling]
6:                        [--block-data-path BLOCK_DATA_PATH]
6:                        [--embedding-path EMBEDDING_PATH]
6:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
6:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
6:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
6:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
6:                        [--log-params-norm] [--log-num-zeros-in-grad]
6:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
6:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
6:                        [--log-timers-to-tensorboard]
6:                        [--log-batch-size-to-tensorboard]
6:                        [--no-log-learnig-rate-to-tensorboard]
6:                        [--no-log-loss-scale-to-tensorboard]
6:                        [--log-validation-ppl-to-tensorboard]
6:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
6:                        [--zero-contigious-gradients]
6:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
6:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
6:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
6:                        [--scattered-embeddings] [--split-transformers]
6:                        [--memory-centric-tiled-linear]
6:                        [--tile-factor TILE_FACTOR]
6:                        [--deepspeed-activation-checkpointing]
6:                        [--partition-activations] [--contigious-checkpointing]
6:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
6:                        [--profile-backward] [--deepspeed]
6:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
6:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
6: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
6:                        [--hidden-size HIDDEN_SIZE]
6:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
6:                        [--num-attention-heads NUM_ATTENTION_HEADS]
6:                        [--kv-channels KV_CHANNELS]
6:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
6:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
6:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
6:                        [--layernorm-epsilon LAYERNORM_EPSILON]
6:                        [--sync-tp-duplicated-parameters]
6:                        [--apply-residual-connection-post-layernorm]
6:                        [--embed-layernorm] [--openai-gelu]
6:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
6:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
6:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
6:                        [--kill-switch-path KILL_SWITCH_PATH]
6:                        [--log-level {debug,info,warning,error,critical}]
6:                        [--log-level-replica {debug,info,warning,error,critical}]
6:                        [--attention-dropout ATTENTION_DROPOUT]
6:                        [--hidden-dropout HIDDEN_DROPOUT]
6:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
6:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
6:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
6:                        [--micro-batch-size MICRO_BATCH_SIZE]
6:                        [--batch-size BATCH_SIZE]
6:                        [--global-batch-size GLOBAL_BATCH_SIZE]
6:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
6:                        [--checkpoint-activations]
6:                        [--distribute-checkpointed-activations]
6:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
6:                        [--train-iters TRAIN_ITERS]
6:                        [--train-samples TRAIN_SAMPLES]
6:                        [--train-tokens TRAIN_TOKENS]
6:                        [--log-interval LOG_INTERVAL]
6:                        [--exit-interval EXIT_INTERVAL]
6:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
6:                        [--tensorboard-dir TENSORBOARD_DIR]
6:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
6:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
6:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
6:                        [--use-bnb-optimizer]
6:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
6:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
6:                        [--eval-only EVAL_ONLY]
6:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
6:                        [--inference]
6:                        [--abort-on-unmet-fused-kernel-constraints]
6:                        [--pp-partition-method PP_PARTITION_METHOD]
6:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
6:                        [--init-method-xavier-uniform] [--lr LR]
6:                        [--lr-decay-style {constant,linear,cosine}]
6:                        [--lr-decay-iters LR_DECAY_ITERS]
6:                        [--lr-decay-samples LR_DECAY_SAMPLES]
6:                        [--lr-decay-tokens LR_DECAY_TOKENS]
6:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
6:                        [--lr-warmup-iters LR_WARMUP_ITERS]
6:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
6:                        [--warmup WARMUP] [--min-lr MIN_LR]
6:                        [--override-lr-scheduler]
6:                        [--use-checkpoint-lr-scheduler]
6:                        [--universal-checkpoint] [--save SAVE]
6:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
6:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
6:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
6:                        [--loss-scale LOSS_SCALE]
6:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
6:                        [--min-loss-scale MIN_LOSS_SCALE]
6:                        [--loss-scale-window LOSS_SCALE_WINDOW]
6:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
6:                        [--no-query-key-layer-scaling]
6:                        [--attention-softmax-in-fp32]
6:                        [--accumulate-allreduce-grads-in-fp32]
6:                        [--fp16-lm-cross-entropy]
6:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
6:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
6:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
6:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
6:                        [--distributed-backend {nccl,gloo}]
6:                        [--DDP-impl {local,torch}]
6:                        [--use-contiguous-buffers-in-ddp]
6:                        [--no-scatter-gather-tensors-in-pipeline]
6:                        [--local_rank LOCAL_RANK]
6:                        [--lazy-mpu-init LAZY_MPU_INIT]
6:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
6:                        [--eval-interval EVAL_INTERVAL]
6:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
6:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
6:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
6:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
6:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
6:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
6:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
6:                        [--merge-file MERGE_FILE]
6:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
6:                        [--seq-length SEQ_LENGTH]
6:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
6:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
6:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
6:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
6:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
6:                        [--num-workers NUM_WORKERS]
6:                        [--valid-num-workers VALID_NUM_WORKERS]
6:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
6:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
6:                        [--data-impl {lazy,cached,mmap,infer}]
6:                        [--reset-position-ids] [--reset-attention-mask]
6:                        [--eod-mask-loss] [--loss-on-targets-only]
6:                        [--reweight-loss-based-on-position-frequency]
6:                        [--noise-density NOISE_DENSITY]
6:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
6:                        [--adlr-autoresume]
6:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
6:                        [--ict-head-size ICT_HEAD_SIZE]
6:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
6:                        [--biencoder-shared-query-context-model]
6:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
6:                        [--titles-data-path TITLES_DATA_PATH]
6:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
6:                        [--use-one-sent-docs]
6:                        [--evidence-data-path EVIDENCE_DATA_PATH]
6:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
6:                        [--retriever-score-scaling]
6:                        [--block-data-path BLOCK_DATA_PATH]
6:                        [--embedding-path EMBEDDING_PATH]
6:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
6:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
6:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
6:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
6:                        [--log-params-norm] [--log-num-zeros-in-grad]
6:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
6:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
6:                        [--log-timers-to-tensorboard]
6:                        [--log-batch-size-to-tensorboard]
6:                        [--no-log-learnig-rate-to-tensorboard]
6:                        [--no-log-loss-scale-to-tensorboard]
6:                        [--log-validation-ppl-to-tensorboard]
6:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
6:                        [--zero-contigious-gradients]
6:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
6:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
6:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
6:                        [--scattered-embeddings] [--split-transformers]
6:                        [--memory-centric-tiled-linear]
6:                        [--tile-factor TILE_FACTOR]
6:                        [--deepspeed-activation-checkpointing]
6:                        [--partition-activations] [--contigious-checkpointing]
6:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
6:                        [--profile-backward] [--deepspeed]
6:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
6:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
6: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
6: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
6:                        [--hidden-size HIDDEN_SIZE]
6:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
6:                        [--num-attention-heads NUM_ATTENTION_HEADS]
6:                        [--kv-channels KV_CHANNELS]
6:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
6:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
6:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
6:                        [--layernorm-epsilon LAYERNORM_EPSILON]
6:                        [--sync-tp-duplicated-parameters]
6:                        [--apply-residual-connection-post-layernorm]
6:                        [--embed-layernorm] [--openai-gelu]
6:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
6:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
6:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
6:                        [--kill-switch-path KILL_SWITCH_PATH]
6:                        [--log-level {debug,info,warning,error,critical}]
6:                        [--log-level-replica {debug,info,warning,error,critical}]
6:                        [--attention-dropout ATTENTION_DROPOUT]
6:                        [--hidden-dropout HIDDEN_DROPOUT]
6:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
6:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
6:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
6:                        [--micro-batch-size MICRO_BATCH_SIZE]
6:                        [--batch-size BATCH_SIZE]
6:                        [--global-batch-size GLOBAL_BATCH_SIZE]
6:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
6:                        [--checkpoint-activations]
6:                        [--distribute-checkpointed-activations]
6:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
6:                        [--train-iters TRAIN_ITERS]
6:                        [--train-samples TRAIN_SAMPLES]
6:                        [--train-tokens TRAIN_TOKENS]
6:                        [--log-interval LOG_INTERVAL]
6:                        [--exit-interval EXIT_INTERVAL]
6:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
6:                        [--tensorboard-dir TENSORBOARD_DIR]
6:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
6:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
6:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
6:                        [--use-bnb-optimizer]
6:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
6:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
6:                        [--eval-only EVAL_ONLY]
6:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
6:                        [--inference]
6:                        [--abort-on-unmet-fused-kernel-constraints]
6:                        [--pp-partition-method PP_PARTITION_METHOD]
6:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
6:                        [--init-method-xavier-uniform] [--lr LR]
6:                        [--lr-decay-style {constant,linear,cosine}]
6:                        [--lr-decay-iters LR_DECAY_ITERS]
6:                        [--lr-decay-samples LR_DECAY_SAMPLES]
6:                        [--lr-decay-tokens LR_DECAY_TOKENS]
6:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
6:                        [--lr-warmup-iters LR_WARMUP_ITERS]
6:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
6:                        [--warmup WARMUP] [--min-lr MIN_LR]
6:                        [--override-lr-scheduler]
6:                        [--use-checkpoint-lr-scheduler]
6:                        [--universal-checkpoint] [--save SAVE]
6:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
6:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
6:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
6:                        [--loss-scale LOSS_SCALE]
6:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
6:                        [--min-loss-scale MIN_LOSS_SCALE]
6:                        [--loss-scale-window LOSS_SCALE_WINDOW]
6:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
6:                        [--no-query-key-layer-scaling]
6:                        [--attention-softmax-in-fp32]
6:                        [--accumulate-allreduce-grads-in-fp32]
6:                        [--fp16-lm-cross-entropy]
6:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
6:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
6:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
6:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6:                        [--distributed-backend {nccl,gloo}]
6:                        [--DDP-impl {local,torch}]
6:                        [--use-contiguous-buffers-in-ddp]
6:                        [--no-scatter-gather-tensors-in-pipeline]
6:                        [--local_rank LOCAL_RANK]
6:                        [--lazy-mpu-init LAZY_MPU_INIT]
6:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
6:                        [--eval-interval EVAL_INTERVAL]
6:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
6:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
6:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
6:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
6:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
6:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
6:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
6:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
6:                        [--merge-file MERGE_FILE]
6:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
6:                        [--seq-length SEQ_LENGTH]
6:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
6:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
6:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
6:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
6:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
6:                        [--num-workers NUM_WORKERS]
6:                        [--valid-num-workers VALID_NUM_WORKERS]
6:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
6:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
6:                        [--data-impl {lazy,cached,mmap,infer}]
6:                        [--reset-position-ids] [--reset-attention-mask]
6:                        [--eod-mask-loss] [--loss-on-targets-only]
6:                        [--reweight-loss-based-on-position-frequency]
6:                        [--noise-density NOISE_DENSITY]
6:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
6:                        [--adlr-autoresume]
6:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
6:                        [--ict-head-size ICT_HEAD_SIZE]
6:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
6:                        [--biencoder-shared-query-context-model]
6:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
6:                        [--titles-data-path TITLES_DATA_PATH]
6:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
6:                        [--use-one-sent-docs]
6:                        [--evidence-data-path EVIDENCE_DATA_PATH]
6:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
6:                        [--retriever-score-scaling]
6:                        [--block-data-path BLOCK_DATA_PATH]
6:                        [--embedding-path EMBEDDING_PATH]
6:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
6:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
6:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
6:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
6:                        [--log-params-norm] [--log-num-zeros-in-grad]
6:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
6:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
6:                        [--log-timers-to-tensorboard]
6:                        [--log-batch-size-to-tensorboard]
6:                        [--no-log-learnig-rate-to-tensorboard]
6:                        [--no-log-loss-scale-to-tensorboard]
6:                        [--log-validation-ppl-to-tensorboard]
6:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
6:                        [--zero-contigious-gradients]
6:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
6:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
6:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
6:                        [--scattered-embeddings] [--split-transformers]
6:                        [--memory-centric-tiled-linear]
6:                        [--tile-factor TILE_FACTOR]
6:                        [--deepspeed-activation-checkpointing]
6:                        [--partition-activations] [--contigious-checkpointing]
6:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
6:                        [--profile-backward] [--deepspeed]
6:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
6:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
6: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
6: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
6:                        [--hidden-size HIDDEN_SIZE]
6:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
6:                        [--num-attention-heads NUM_ATTENTION_HEADS]
6:                        [--kv-channels KV_CHANNELS]
6:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
6:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
6:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
6:                        [--layernorm-epsilon LAYERNORM_EPSILON]
6:                        [--sync-tp-duplicated-parameters]
6:                        [--apply-residual-connection-post-layernorm]
6:                        [--embed-layernorm] [--openai-gelu]
6:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
6:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
6:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
6:                        [--kill-switch-path KILL_SWITCH_PATH]
6:                        [--log-level {debug,info,warning,error,critical}]
6:                        [--log-level-replica {debug,info,warning,error,critical}]
6:                        [--attention-dropout ATTENTION_DROPOUT]
6:                        [--hidden-dropout HIDDEN_DROPOUT]
6:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
6:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
6:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
6:                        [--micro-batch-size MICRO_BATCH_SIZE]
6:                        [--batch-size BATCH_SIZE]
6:                        [--global-batch-size GLOBAL_BATCH_SIZE]
6:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
6:                        [--checkpoint-activations]
6:                        [--distribute-checkpointed-activations]
6:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
6:                        [--train-iters TRAIN_ITERS]
6:                        [--train-samples TRAIN_SAMPLES]
6:                        [--train-tokens TRAIN_TOKENS]
6:                        [--log-interval LOG_INTERVAL]
6:                        [--exit-interval EXIT_INTERVAL]
6:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
6:                        [--tensorboard-dir TENSORBOARD_DIR]
6:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
6:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
6:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
6:                        [--use-bnb-optimizer]
6:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
6:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
6:                        [--eval-only EVAL_ONLY]
6:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
6:                        [--inference]
6:                        [--abort-on-unmet-fused-kernel-constraints]
6:                        [--pp-partition-method PP_PARTITION_METHOD]
6:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
6:                        [--init-method-xavier-uniform] [--lr LR]
6:                        [--lr-decay-style {constant,linear,cosine}]
6:                        [--lr-decay-iters LR_DECAY_ITERS]
6:                        [--lr-decay-samples LR_DECAY_SAMPLES]
6:                        [--lr-decay-tokens LR_DECAY_TOKENS]
6:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
6:                        [--lr-warmup-iters LR_WARMUP_ITERS]
6:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
6:                        [--warmup WARMUP] [--min-lr MIN_LR]
6:                        [--override-lr-scheduler]
6:                        [--use-checkpoint-lr-scheduler]
6:                        [--universal-checkpoint] [--save SAVE]
6:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
6:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
6:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
6:                        [--loss-scale LOSS_SCALE]
6:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
6:                        [--min-loss-scale MIN_LOSS_SCALE]
6:                        [--loss-scale-window LOSS_SCALE_WINDOW]
6:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
6:                        [--no-query-key-layer-scaling]
6:                        [--attention-softmax-in-fp32]
6:                        [--accumulate-allreduce-grads-in-fp32]
6:                        [--fp16-lm-cross-entropy]
6:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
6:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
6:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
6:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
6:                        [--distributed-backend {nccl,gloo}]
6:                        [--DDP-impl {local,torch}]
6:                        [--use-contiguous-buffers-in-ddp]
6:                        [--no-scatter-gather-tensors-in-pipeline]
6:                        [--local_rank LOCAL_RANK]
6:                        [--lazy-mpu-init LAZY_MPU_INIT]
6:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
6:                        [--eval-interval EVAL_INTERVAL]
6:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
6:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
6:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
6:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
6:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
6:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
6:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
6:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
6:                        [--merge-file MERGE_FILE]
6:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
6:                        [--seq-length SEQ_LENGTH]
6:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
6:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
6:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
6:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
6:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
6:                        [--num-workers NUM_WORKERS]
6:                        [--valid-num-workers VALID_NUM_WORKERS]
6:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
6:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
6:                        [--data-impl {lazy,cached,mmap,infer}]
6:                        [--reset-position-ids] [--reset-attention-mask]
6:                        [--eod-mask-loss] [--loss-on-targets-only]
6:                        [--reweight-loss-based-on-position-frequency]
6:                        [--noise-density NOISE_DENSITY]
6:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
6:                        [--adlr-autoresume]
6:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
6:                        [--ict-head-size ICT_HEAD_SIZE]
6:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
6:                        [--biencoder-shared-query-context-model]
6:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
6:                        [--titles-data-path TITLES_DATA_PATH]
6:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
6:                        [--use-one-sent-docs]
6:                        [--evidence-data-path EVIDENCE_DATA_PATH]
6:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
6:                        [--retriever-score-scaling]
6:                        [--block-data-path BLOCK_DATA_PATH]
6:                        [--embedding-path EMBEDDING_PATH]
6:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
6:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
6:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
6:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
6:                        [--log-params-norm] [--log-num-zeros-in-grad]
6:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
6:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
6:                        [--log-timers-to-tensorboard]
6:                        [--log-batch-size-to-tensorboard]
6:                        [--no-log-learnig-rate-to-tensorboard]
6:                        [--no-log-loss-scale-to-tensorboard]
6:                        [--log-validation-ppl-to-tensorboard]
6:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
6:                        [--zero-contigious-gradients]
6:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
6:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
6:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
6:                        [--scattered-embeddings] [--split-transformers]
6:                        [--memory-centric-tiled-linear]
6:                        [--tile-factor TILE_FACTOR]
6:                        [--deepspeed-activation-checkpointing]
6:                        [--partition-activations] [--contigious-checkpointing]
6:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
6:                        [--profile-backward] [--deepspeed]
6:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
6:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
6: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
7:                        [--hidden-size HIDDEN_SIZE]
7:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
7:                        [--num-attention-heads NUM_ATTENTION_HEADS]
7:                        [--kv-channels KV_CHANNELS]
7:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
7:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
7:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
7:                        [--layernorm-epsilon LAYERNORM_EPSILON]
7:                        [--sync-tp-duplicated-parameters]
7:                        [--apply-residual-connection-post-layernorm]
7:                        [--embed-layernorm] [--openai-gelu]
7:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
7:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
7:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
7:                        [--kill-switch-path KILL_SWITCH_PATH]
7:                        [--log-level {debug,info,warning,error,critical}]
7:                        [--log-level-replica {debug,info,warning,error,critical}]
7:                        [--attention-dropout ATTENTION_DROPOUT]
7:                        [--hidden-dropout HIDDEN_DROPOUT]
7:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
7:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
7:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
7:                        [--micro-batch-size MICRO_BATCH_SIZE]
7:                        [--batch-size BATCH_SIZE]
7:                        [--global-batch-size GLOBAL_BATCH_SIZE]
7:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
7:                        [--checkpoint-activations]
7:                        [--distribute-checkpointed-activations]
7:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
7:                        [--train-iters TRAIN_ITERS]
7:                        [--train-samples TRAIN_SAMPLES]
7:                        [--train-tokens TRAIN_TOKENS]
7:                        [--log-interval LOG_INTERVAL]
7:                        [--exit-interval EXIT_INTERVAL]
7:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
7:                        [--tensorboard-dir TENSORBOARD_DIR]
7:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
7:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
7:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
7:                        [--use-bnb-optimizer]
7:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
7:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
7:                        [--eval-only EVAL_ONLY]
7:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
7:                        [--inference]
7:                        [--abort-on-unmet-fused-kernel-constraints]
7:                        [--pp-partition-method PP_PARTITION_METHOD]
7:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
7:                        [--init-method-xavier-uniform] [--lr LR]
7:                        [--lr-decay-style {constant,linear,cosine}]
7:                        [--lr-decay-iters LR_DECAY_ITERS]
7:                        [--lr-decay-samples LR_DECAY_SAMPLES]
7:                        [--lr-decay-tokens LR_DECAY_TOKENS]
7:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
7:                        [--lr-warmup-iters LR_WARMUP_ITERS]
7:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
7:                        [--warmup WARMUP] [--min-lr MIN_LR]
7:                        [--override-lr-scheduler]
7:                        [--use-checkpoint-lr-scheduler]
7:                        [--universal-checkpoint] [--save SAVE]
7:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
7:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
7:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
7:                        [--loss-scale LOSS_SCALE]
7:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
7:                        [--min-loss-scale MIN_LOSS_SCALE]
7:                        [--loss-scale-window LOSS_SCALE_WINDOW]
7:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
7:                        [--no-query-key-layer-scaling]
7:                        [--attention-softmax-in-fp32]
7:                        [--accumulate-allreduce-grads-in-fp32]
7:                        [--fp16-lm-cross-entropy]
7:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
7:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
7:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
7:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
7:                        [--distributed-backend {nccl,gloo}]
7:                        [--DDP-impl {local,torch}]
7:                        [--use-contiguous-buffers-in-ddp]
7:                        [--no-scatter-gather-tensors-in-pipeline]
7:                        [--local_rank LOCAL_RANK]
7:                        [--lazy-mpu-init LAZY_MPU_INIT]
7:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
7:                        [--eval-interval EVAL_INTERVAL]
7:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
7:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
7:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
7:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
7:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
7:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
7:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
7:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
7:                        [--merge-file MERGE_FILE]
7:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
7:                        [--seq-length SEQ_LENGTH]
7:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
7:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
7:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
7:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
7:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
7:                        [--num-workers NUM_WORKERS]
7:                        [--valid-num-workers VALID_NUM_WORKERS]
7:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
7:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
7:                        [--data-impl {lazy,cached,mmap,infer}]
7:                        [--reset-position-ids] [--reset-attention-mask]
7:                        [--eod-mask-loss] [--loss-on-targets-only]
7:                        [--reweight-loss-based-on-position-frequency]
7:                        [--noise-density NOISE_DENSITY]
7:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
7:                        [--adlr-autoresume]
7:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
7:                        [--ict-head-size ICT_HEAD_SIZE]
7:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
7:                        [--biencoder-shared-query-context-model]
7:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
7:                        [--titles-data-path TITLES_DATA_PATH]
7:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
7:                        [--use-one-sent-docs]
7:                        [--evidence-data-path EVIDENCE_DATA_PATH]
7:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
7:                        [--retriever-score-scaling]
7:                        [--block-data-path BLOCK_DATA_PATH]
7:                        [--embedding-path EMBEDDING_PATH]
7:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
7:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
7:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
7:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
7:                        [--log-params-norm] [--log-num-zeros-in-grad]
7:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
7:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
7:                        [--log-timers-to-tensorboard]
7:                        [--log-batch-size-to-tensorboard]
7:                        [--no-log-learnig-rate-to-tensorboard]
7:                        [--no-log-loss-scale-to-tensorboard]
7:                        [--log-validation-ppl-to-tensorboard]
7:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
7:                        [--zero-contigious-gradients]
7:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
7:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
7:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
7:                        [--scattered-embeddings] [--split-transformers]
7:                        [--memory-centric-tiled-linear]
7:                        [--tile-factor TILE_FACTOR]
7:                        [--deepspeed-activation-checkpointing]
7:                        [--partition-activations] [--contigious-checkpointing]
7:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
7:                        [--profile-backward] [--deepspeed]
7:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
7:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
7: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
7: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
7:                        [--hidden-size HIDDEN_SIZE]
7:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
7:                        [--num-attention-heads NUM_ATTENTION_HEADS]
7:                        [--kv-channels KV_CHANNELS]
7:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
7:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
7:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
7:                        [--layernorm-epsilon LAYERNORM_EPSILON]
7:                        [--sync-tp-duplicated-parameters]
7:                        [--apply-residual-connection-post-layernorm]
7:                        [--embed-layernorm] [--openai-gelu]
7:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
7:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
7:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
7:                        [--kill-switch-path KILL_SWITCH_PATH]
7:                        [--log-level {debug,info,warning,error,critical}]
7:                        [--log-level-replica {debug,info,warning,error,critical}]
7:                        [--attention-dropout ATTENTION_DROPOUT]
7:                        [--hidden-dropout HIDDEN_DROPOUT]
7:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
7:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
7:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
7:                        [--micro-batch-size MICRO_BATCH_SIZE]
7:                        [--batch-size BATCH_SIZE]
7:                        [--global-batch-size GLOBAL_BATCH_SIZE]
7:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
7:                        [--checkpoint-activations]
7:                        [--distribute-checkpointed-activations]
7:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
7:                        [--train-iters TRAIN_ITERS]
7:                        [--train-samples TRAIN_SAMPLES]
7:                        [--train-tokens TRAIN_TOKENS]
7:                        [--log-interval LOG_INTERVAL]
7:                        [--exit-interval EXIT_INTERVAL]
7:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
7:                        [--tensorboard-dir TENSORBOARD_DIR]
7:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
7:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
7:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
7:                        [--use-bnb-optimizer]
7:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
7:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
7:                        [--eval-only EVAL_ONLY]
7:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
7:                        [--inference]
7:                        [--abort-on-unmet-fused-kernel-constraints]
7:                        [--pp-partition-method PP_PARTITION_METHOD]
7:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
7:                        [--init-method-xavier-uniform] [--lr LR]
7:                        [--lr-decay-style {constant,linear,cosine}]
7:                        [--lr-decay-iters LR_DECAY_ITERS]
7:                        [--lr-decay-samples LR_DECAY_SAMPLES]
7:                        [--lr-decay-tokens LR_DECAY_TOKENS]
7:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
7:                        [--lr-warmup-iters LR_WARMUP_ITERS]
7:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
7:                        [--warmup WARMUP] [--min-lr MIN_LR]
7:                        [--override-lr-scheduler]
7:                        [--use-checkpoint-lr-scheduler]
7:                        [--universal-checkpoint] [--save SAVE]
7:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
7:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
7:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
7:                        [--loss-scale LOSS_SCALE]
7:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
7:                        [--min-loss-scale MIN_LOSS_SCALE]
7:                        [--loss-scale-window LOSS_SCALE_WINDOW]
7:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
7:                        [--no-query-key-layer-scaling]
7:                        [--attention-softmax-in-fp32]
7:                        [--accumulate-allreduce-grads-in-fp32]
7:                        [--fp16-lm-cross-entropy]
7:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
7:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
7:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
7:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
7:                        [--distributed-backend {nccl,gloo}]
7:                        [--DDP-impl {local,torch}]
7:                        [--use-contiguous-buffers-in-ddp]
7:                        [--no-scatter-gather-tensors-in-pipeline]
7:                        [--local_rank LOCAL_RANK]
7:                        [--lazy-mpu-init LAZY_MPU_INIT]
7:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
7:                        [--eval-interval EVAL_INTERVAL]
7:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
7:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
7:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
7:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
7:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
7:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
7:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
7:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
7:                        [--merge-file MERGE_FILE]
7:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
7:                        [--seq-length SEQ_LENGTH]
7:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
7:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
7:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
7:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
7:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
7:                        [--num-workers NUM_WORKERS]
7:                        [--valid-num-workers VALID_NUM_WORKERS]
7:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
7:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
7:                        [--data-impl {lazy,cached,mmap,infer}]
7:                        [--reset-position-ids] [--reset-attention-mask]
7:                        [--eod-mask-loss] [--loss-on-targets-only]
7:                        [--reweight-loss-based-on-position-frequency]
7:                        [--noise-density NOISE_DENSITY]
7:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
7:                        [--adlr-autoresume]
7:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
7:                        [--ict-head-size ICT_HEAD_SIZE]
7:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
7:                        [--biencoder-shared-query-context-model]
7:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
7:                        [--titles-data-path TITLES_DATA_PATH]
7:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
7:                        [--use-one-sent-docs]
7:                        [--evidence-data-path EVIDENCE_DATA_PATH]
7:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
7:                        [--retriever-score-scaling]
7:                        [--block-data-path BLOCK_DATA_PATH]
7:                        [--embedding-path EMBEDDING_PATH]
7:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
7:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
7:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
7:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
7:                        [--log-params-norm] [--log-num-zeros-in-grad]
7:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
7:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
7:                        [--log-timers-to-tensorboard]
7:                        [--log-batch-size-to-tensorboard]
7:                        [--no-log-learnig-rate-to-tensorboard]
7:                        [--no-log-loss-scale-to-tensorboard]
7:                        [--log-validation-ppl-to-tensorboard]
7:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
7:                        [--zero-contigious-gradients]
7:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
7:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
7:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
7:                        [--scattered-embeddings] [--split-transformers]
7:                        [--memory-centric-tiled-linear]
7:                        [--tile-factor TILE_FACTOR]
7:                        [--deepspeed-activation-checkpointing]
7:                        [--partition-activations] [--contigious-checkpointing]
7:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
7:                        [--profile-backward] [--deepspeed]
7:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
7:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
7: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
0:                        [--hidden-size HIDDEN_SIZE]
0:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
0:                        [--num-attention-heads NUM_ATTENTION_HEADS]
0:                        [--kv-channels KV_CHANNELS]
0:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
0:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
0:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
0:                        [--layernorm-epsilon LAYERNORM_EPSILON]
0:                        [--sync-tp-duplicated-parameters]
0:                        [--apply-residual-connection-post-layernorm]
0:                        [--embed-layernorm] [--openai-gelu]
0:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
0:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
0:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
0:                        [--kill-switch-path KILL_SWITCH_PATH]
0:                        [--log-level {debug,info,warning,error,critical}]
0:                        [--log-level-replica {debug,info,warning,error,critical}]
0:                        [--attention-dropout ATTENTION_DROPOUT]
0:                        [--hidden-dropout HIDDEN_DROPOUT]
0:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
0:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
0:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
0:                        [--micro-batch-size MICRO_BATCH_SIZE]
0:                        [--batch-size BATCH_SIZE]
0:                        [--global-batch-size GLOBAL_BATCH_SIZE]
0:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
0:                        [--checkpoint-activations]
0:                        [--distribute-checkpointed-activations]
0:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
0:                        [--train-iters TRAIN_ITERS]
0:                        [--train-samples TRAIN_SAMPLES]
0:                        [--train-tokens TRAIN_TOKENS]
0:                        [--log-interval LOG_INTERVAL]
0:                        [--exit-interval EXIT_INTERVAL]
0:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
0:                        [--tensorboard-dir TENSORBOARD_DIR]
0:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
0:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
0:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
0:                        [--use-bnb-optimizer]
0:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
0:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
0:                        [--eval-only EVAL_ONLY]
0:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
0:                        [--inference]
0:                        [--abort-on-unmet-fused-kernel-constraints]
0:                        [--pp-partition-method PP_PARTITION_METHOD]
0:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
0:                        [--init-method-xavier-uniform] [--lr LR]
0:                        [--lr-decay-style {constant,linear,cosine}]
0:                        [--lr-decay-iters LR_DECAY_ITERS]
0:                        [--lr-decay-samples LR_DECAY_SAMPLES]
0:                        [--lr-decay-tokens LR_DECAY_TOKENS]
0:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
0:                        [--lr-warmup-iters LR_WARMUP_ITERS]
0:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
0:                        [--warmup WARMUP] [--min-lr MIN_LR]
0:                        [--override-lr-scheduler]
0:                        [--use-checkpoint-lr-scheduler]
0:                        [--universal-checkpoint] [--save SAVE]
0:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
0:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
0:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
0:                        [--loss-scale LOSS_SCALE]
0:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
0:                        [--min-loss-scale MIN_LOSS_SCALE]
0:                        [--loss-scale-window LOSS_SCALE_WINDOW]
0:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
0:                        [--no-query-key-layer-scaling]
0:                        [--attention-softmax-in-fp32]
0:                        [--accumulate-allreduce-grads-in-fp32]
0:                        [--fp16-lm-cross-entropy]
0:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
0:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
0:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
0:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
0:                        [--distributed-backend {nccl,gloo}]
0:                        [--DDP-impl {local,torch}]
0:                        [--use-contiguous-buffers-in-ddp]
0:                        [--no-scatter-gather-tensors-in-pipeline]
0:                        [--local_rank LOCAL_RANK]
0:                        [--lazy-mpu-init LAZY_MPU_INIT]
0:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
0:                        [--eval-interval EVAL_INTERVAL]
0:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
0:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
0:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
0:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
0:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
0:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
0:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
0:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
0:                        [--merge-file MERGE_FILE]
0:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
0:                        [--seq-length SEQ_LENGTH]
0:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
0:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
0:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
0:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
0:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
0:                        [--num-workers NUM_WORKERS]
0:                        [--valid-num-workers VALID_NUM_WORKERS]
0:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
0:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
0:                        [--data-impl {lazy,cached,mmap,infer}]
0:                        [--reset-position-ids] [--reset-attention-mask]
0:                        [--eod-mask-loss] [--loss-on-targets-only]
0:                        [--reweight-loss-based-on-position-frequency]
0:                        [--noise-density NOISE_DENSITY]
0:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
0:                        [--adlr-autoresume]
0:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
0:                        [--ict-head-size ICT_HEAD_SIZE]
0:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
0:                        [--biencoder-shared-query-context-model]
0:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
0:                        [--titles-data-path TITLES_DATA_PATH]
0:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
0:                        [--use-one-sent-docs]
0:                        [--evidence-data-path EVIDENCE_DATA_PATH]
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
0:                        [--retriever-score-scaling]
0:                        [--block-data-path BLOCK_DATA_PATH]
0:                        [--embedding-path EMBEDDING_PATH]
0:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
0:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
0:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
0:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
0:                        [--log-params-norm] [--log-num-zeros-in-grad]
0:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
0:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
0:                        [--log-timers-to-tensorboard]
0:                        [--log-batch-size-to-tensorboard]
0:                        [--no-log-learnig-rate-to-tensorboard]
0:                        [--no-log-loss-scale-to-tensorboard]
0:                        [--log-validation-ppl-to-tensorboard]
0:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
0:                        [--zero-contigious-gradients]
0:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
0:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
0:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
0:                        [--scattered-embeddings] [--split-transformers]
0:                        [--memory-centric-tiled-linear]
0:                        [--tile-factor TILE_FACTOR]
0:                        [--deepspeed-activation-checkpointing]
0:                        [--partition-activations] [--contigious-checkpointing]
0:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
0:                        [--profile-backward] [--deepspeed]
0:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
0:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
0: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
0: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
0:                        [--hidden-size HIDDEN_SIZE]
0:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
0:                        [--num-attention-heads NUM_ATTENTION_HEADS]
0:                        [--kv-channels KV_CHANNELS]
0:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
0:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
0:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
0:                        [--layernorm-epsilon LAYERNORM_EPSILON]
0:                        [--sync-tp-duplicated-parameters]
0:                        [--apply-residual-connection-post-layernorm]
0:                        [--embed-layernorm] [--openai-gelu]
0:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
0:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
0:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
0:                        [--kill-switch-path KILL_SWITCH_PATH]
0:                        [--log-level {debug,info,warning,error,critical}]
0:                        [--log-level-replica {debug,info,warning,error,critical}]
0:                        [--attention-dropout ATTENTION_DROPOUT]
0:                        [--hidden-dropout HIDDEN_DROPOUT]
0:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
0:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
0:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
0:                        [--micro-batch-size MICRO_BATCH_SIZE]
0:                        [--batch-size BATCH_SIZE]
0:                        [--global-batch-size GLOBAL_BATCH_SIZE]
0:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
0:                        [--checkpoint-activations]
0:                        [--distribute-checkpointed-activations]
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
0:                        [--train-iters TRAIN_ITERS]
0:                        [--train-samples TRAIN_SAMPLES]
0:                        [--train-tokens TRAIN_TOKENS]
0:                        [--log-interval LOG_INTERVAL]
0:                        [--exit-interval EXIT_INTERVAL]
0:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
0:                        [--tensorboard-dir TENSORBOARD_DIR]
0:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
0:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
0:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
0:                        [--use-bnb-optimizer]
0:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
0:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
0:                        [--eval-only EVAL_ONLY]
0:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
0:                        [--inference]
0:                        [--abort-on-unmet-fused-kernel-constraints]
0:                        [--pp-partition-method PP_PARTITION_METHOD]
0:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
0:                        [--init-method-xavier-uniform] [--lr LR]
0:                        [--lr-decay-style {constant,linear,cosine}]
0:                        [--lr-decay-iters LR_DECAY_ITERS]
0:                        [--lr-decay-samples LR_DECAY_SAMPLES]
0:                        [--lr-decay-tokens LR_DECAY_TOKENS]
0:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
0:                        [--lr-warmup-iters LR_WARMUP_ITERS]
0:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
0:                        [--warmup WARMUP] [--min-lr MIN_LR]
0:                        [--override-lr-scheduler]
0:                        [--use-checkpoint-lr-scheduler]
0:                        [--universal-checkpoint] [--save SAVE]
0:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
0:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
0:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
0:                        [--loss-scale LOSS_SCALE]
0:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
0:                        [--min-loss-scale MIN_LOSS_SCALE]
0:                        [--loss-scale-window LOSS_SCALE_WINDOW]
0:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
0:                        [--no-query-key-layer-scaling]
0:                        [--attention-softmax-in-fp32]
0:                        [--accumulate-allreduce-grads-in-fp32]
0:                        [--fp16-lm-cross-entropy]
0:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
0:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
0:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
0:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
0:                        [--distributed-backend {nccl,gloo}]
0:                        [--DDP-impl {local,torch}]
0:                        [--use-contiguous-buffers-in-ddp]
0:                        [--no-scatter-gather-tensors-in-pipeline]
0:                        [--local_rank LOCAL_RANK]
0:                        [--lazy-mpu-init LAZY_MPU_INIT]
0:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
0:                        [--eval-interval EVAL_INTERVAL]
0:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
0:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
0:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
0:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
0:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
0:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
0:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
0:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
0:                        [--merge-file MERGE_FILE]
0:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
0:                        [--seq-length SEQ_LENGTH]
0:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
0:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
0:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
0:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
0:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
0:                        [--num-workers NUM_WORKERS]
0:                        [--valid-num-workers VALID_NUM_WORKERS]
0:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
0:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
0:                        [--data-impl {lazy,cached,mmap,infer}]
0:                        [--reset-position-ids] [--reset-attention-mask]
0:                        [--eod-mask-loss] [--loss-on-targets-only]
0:                        [--reweight-loss-based-on-position-frequency]
0:                        [--noise-density NOISE_DENSITY]
0:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
0:                        [--adlr-autoresume]
0:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
0:                        [--ict-head-size ICT_HEAD_SIZE]
0:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
0:                        [--biencoder-shared-query-context-model]
0:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
0:                        [--titles-data-path TITLES_DATA_PATH]
0:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
0:                        [--use-one-sent-docs]
0:                        [--evidence-data-path EVIDENCE_DATA_PATH]
0:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
0:                        [--retriever-score-scaling]
0:                        [--block-data-path BLOCK_DATA_PATH]
0:                        [--embedding-path EMBEDDING_PATH]
0:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
0:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
0:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
0:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
0:                        [--log-params-norm] [--log-num-zeros-in-grad]
0:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
0:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
0:                        [--log-timers-to-tensorboard]
0:                        [--log-batch-size-to-tensorboard]
0:                        [--no-log-learnig-rate-to-tensorboard]
0:                        [--no-log-loss-scale-to-tensorboard]
0:                        [--log-validation-ppl-to-tensorboard]
0:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
0:                        [--zero-contigious-gradients]
0:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
0:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
0:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
0:                        [--scattered-embeddings] [--split-transformers]
0:                        [--memory-centric-tiled-linear]
0:                        [--tile-factor TILE_FACTOR]
0:                        [--deepspeed-activation-checkpointing]
0:                        [--partition-activations] [--contigious-checkpointing]
0:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
0:                        [--profile-backward] [--deepspeed]
0:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
0:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
0: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
0:                        [--hidden-size HIDDEN_SIZE]
0:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
0:                        [--num-attention-heads NUM_ATTENTION_HEADS]
0:                        [--kv-channels KV_CHANNELS]
0:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
0:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
0:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
0:                        [--layernorm-epsilon LAYERNORM_EPSILON]
0:                        [--sync-tp-duplicated-parameters]
0:                        [--apply-residual-connection-post-layernorm]
0:                        [--embed-layernorm] [--openai-gelu]
0:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
0:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
0:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
0:                        [--kill-switch-path KILL_SWITCH_PATH]
0:                        [--log-level {debug,info,warning,error,critical}]
0:                        [--log-level-replica {debug,info,warning,error,critical}]
0:                        [--attention-dropout ATTENTION_DROPOUT]
0:                        [--hidden-dropout HIDDEN_DROPOUT]
0:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
0:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
0:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
0:                        [--micro-batch-size MICRO_BATCH_SIZE]
0:                        [--batch-size BATCH_SIZE]
0:                        [--global-batch-size GLOBAL_BATCH_SIZE]
0:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
0:                        [--checkpoint-activations]
0:                        [--distribute-checkpointed-activations]
0:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
0:                        [--train-iters TRAIN_ITERS]
0:                        [--train-samples TRAIN_SAMPLES]
0:                        [--train-tokens TRAIN_TOKENS]
0:                        [--log-interval LOG_INTERVAL]
0:                        [--exit-interval EXIT_INTERVAL]
0:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
0:                        [--tensorboard-dir TENSORBOARD_DIR]
0:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
0:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
0:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
0:                        [--use-bnb-optimizer]
0:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
0:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
0:                        [--eval-only EVAL_ONLY]
0:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
0:                        [--inference]
0:                        [--abort-on-unmet-fused-kernel-constraints]
0:                        [--pp-partition-method PP_PARTITION_METHOD]
0:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
0:                        [--init-method-xavier-uniform] [--lr LR]
0:                        [--lr-decay-style {constant,linear,cosine}]
0:                        [--lr-decay-iters LR_DECAY_ITERS]
0:                        [--lr-decay-samples LR_DECAY_SAMPLES]
0:                        [--lr-decay-tokens LR_DECAY_TOKENS]
0:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
0:                        [--lr-warmup-iters LR_WARMUP_ITERS]
0:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
0:                        [--warmup WARMUP] [--min-lr MIN_LR]
0:                        [--override-lr-scheduler]
0:                        [--use-checkpoint-lr-scheduler]
0:                        [--universal-checkpoint] [--save SAVE]
0:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
0:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
0:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
0:                        [--loss-scale LOSS_SCALE]
0:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
0:                        [--min-loss-scale MIN_LOSS_SCALE]
0:                        [--loss-scale-window LOSS_SCALE_WINDOW]
0:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
0:                        [--no-query-key-layer-scaling]
0:                        [--attention-softmax-in-fp32]
0:                        [--accumulate-allreduce-grads-in-fp32]
0:                        [--fp16-lm-cross-entropy]
0:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
0:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
0:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
0:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
0:                        [--distributed-backend {nccl,gloo}]
0:                        [--DDP-impl {local,torch}]
0:                        [--use-contiguous-buffers-in-ddp]
0:                        [--no-scatter-gather-tensors-in-pipeline]
0:                        [--local_rank LOCAL_RANK]
0:                        [--lazy-mpu-init LAZY_MPU_INIT]
0:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
0:                        [--eval-interval EVAL_INTERVAL]
0:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
0:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
0:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
0:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
0:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
0:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
0:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
0:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
0:                        [--merge-file MERGE_FILE]
0:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
0:                        [--seq-length SEQ_LENGTH]
0:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
0:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
0:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
0:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
0:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
0:                        [--num-workers NUM_WORKERS]
0:                        [--valid-num-workers VALID_NUM_WORKERS]
0:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
0:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
0:                        [--data-impl {lazy,cached,mmap,infer}]
0:                        [--reset-position-ids] [--reset-attention-mask]
0:                        [--eod-mask-loss] [--loss-on-targets-only]
0:                        [--reweight-loss-based-on-position-frequency]
0:                        [--noise-density NOISE_DENSITY]
0:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
0:                        [--adlr-autoresume]
0:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
0:                        [--ict-head-size ICT_HEAD_SIZE]
0:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
0:                        [--biencoder-shared-query-context-model]
0:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
0:                        [--titles-data-path TITLES_DATA_PATH]
0:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
0:                        [--use-one-sent-docs]
0:                        [--evidence-data-path EVIDENCE_DATA_PATH]
0:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
0:                        [--retriever-score-scaling]
0:                        [--block-data-path BLOCK_DATA_PATH]
0:                        [--embedding-path EMBEDDING_PATH]
0:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
0:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
0:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
0:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
0:                        [--log-params-norm] [--log-num-zeros-in-grad]
0:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
0:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
0:                        [--log-timers-to-tensorboard]
0:                        [--log-batch-size-to-tensorboard]
0:                        [--no-log-learnig-rate-to-tensorboard]
0:                        [--no-log-loss-scale-to-tensorboard]
0:                        [--log-validation-ppl-to-tensorboard]
0:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
0:                        [--zero-contigious-gradients]
0:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
0:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
0:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
0:                        [--scattered-embeddings] [--split-transformers]
0:                        [--memory-centric-tiled-linear]
0:                        [--tile-factor TILE_FACTOR]
0:                        [--deepspeed-activation-checkpointing]
0:                        [--partition-activations] [--contigious-checkpointing]
0:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
0:                        [--profile-backward] [--deepspeed]
0:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
0:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
0: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
0: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
0:                        [--hidden-size HIDDEN_SIZE]
0:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
0:                        [--num-attention-heads NUM_ATTENTION_HEADS]
0:                        [--kv-channels KV_CHANNELS]
0:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
0:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
0:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
0:                        [--layernorm-epsilon LAYERNORM_EPSILON]
0:                        [--sync-tp-duplicated-parameters]
0:                        [--apply-residual-connection-post-layernorm]
0:                        [--embed-layernorm] [--openai-gelu]
0:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
0:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
0:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
0:                        [--kill-switch-path KILL_SWITCH_PATH]
0:                        [--log-level {debug,info,warning,error,critical}]
0:                        [--log-level-replica {debug,info,warning,error,critical}]
0:                        [--attention-dropout ATTENTION_DROPOUT]
0:                        [--hidden-dropout HIDDEN_DROPOUT]
0:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
0:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
0:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
0:                        [--micro-batch-size MICRO_BATCH_SIZE]
0:                        [--batch-size BATCH_SIZE]
0:                        [--global-batch-size GLOBAL_BATCH_SIZE]
0:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
0:                        [--checkpoint-activations]
0:                        [--distribute-checkpointed-activations]
0:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
0:                        [--train-iters TRAIN_ITERS]
0:                        [--train-samples TRAIN_SAMPLES]
0:                        [--train-tokens TRAIN_TOKENS]
0:                        [--log-interval LOG_INTERVAL]
0:                        [--exit-interval EXIT_INTERVAL]
0:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
0:                        [--tensorboard-dir TENSORBOARD_DIR]
0:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
0:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
0:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
0:                        [--use-bnb-optimizer]
0:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
0:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
0:                        [--eval-only EVAL_ONLY]
0:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
0:                        [--inference]
0:                        [--abort-on-unmet-fused-kernel-constraints]
0:                        [--pp-partition-method PP_PARTITION_METHOD]
0:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
0:                        [--init-method-xavier-uniform] [--lr LR]
0:                        [--lr-decay-style {constant,linear,cosine}]
0:                        [--lr-decay-iters LR_DECAY_ITERS]
0:                        [--lr-decay-samples LR_DECAY_SAMPLES]
0:                        [--lr-decay-tokens LR_DECAY_TOKENS]
0:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
0:                        [--lr-warmup-iters LR_WARMUP_ITERS]
0:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
0:                        [--warmup WARMUP] [--min-lr MIN_LR]
0:                        [--override-lr-scheduler]
0:                        [--use-checkpoint-lr-scheduler]
0:                        [--universal-checkpoint] [--save SAVE]
0:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
0:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
0:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
0:                        [--loss-scale LOSS_SCALE]
0:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
0:                        [--min-loss-scale MIN_LOSS_SCALE]
0:                        [--loss-scale-window LOSS_SCALE_WINDOW]
0:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
0:                        [--no-query-key-layer-scaling]
0:                        [--attention-softmax-in-fp32]
0:                        [--accumulate-allreduce-grads-in-fp32]
0:                        [--fp16-lm-cross-entropy]
0:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
0:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
0:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
0:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
0:                        [--distributed-backend {nccl,gloo}]
0:                        [--DDP-impl {local,torch}]
0:                        [--use-contiguous-buffers-in-ddp]
0:                        [--no-scatter-gather-tensors-in-pipeline]
0:                        [--local_rank LOCAL_RANK]
0:                        [--lazy-mpu-init LAZY_MPU_INIT]
0:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
0:                        [--eval-interval EVAL_INTERVAL]
0:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
0:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
0:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
0:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
0:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
0:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
0:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
0:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
0:                        [--merge-file MERGE_FILE]
0:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
0:                        [--seq-length SEQ_LENGTH]
0:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
0:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
0:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
0:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
0:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
0:                        [--num-workers NUM_WORKERS]
0:                        [--valid-num-workers VALID_NUM_WORKERS]
0:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
0:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
0:                        [--data-impl {lazy,cached,mmap,infer}]
0:                        [--reset-position-ids] [--reset-attention-mask]
0:                        [--eod-mask-loss] [--loss-on-targets-only]
0:                        [--reweight-loss-based-on-position-frequency]
0:                        [--noise-density NOISE_DENSITY]
0:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
0:                        [--adlr-autoresume]
0:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
0:                        [--ict-head-size ICT_HEAD_SIZE]
0:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
0:                        [--biencoder-shared-query-context-model]
0:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
0:                        [--titles-data-path TITLES_DATA_PATH]
0:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
0:                        [--use-one-sent-docs]
0:                        [--evidence-data-path EVIDENCE_DATA_PATH]
0:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
0:                        [--retriever-score-scaling]
0:                        [--block-data-path BLOCK_DATA_PATH]
0:                        [--embedding-path EMBEDDING_PATH]
0:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
0:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
0:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
0:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
0:                        [--log-params-norm] [--log-num-zeros-in-grad]
0:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
0:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
0:                        [--log-timers-to-tensorboard]
0:                        [--log-batch-size-to-tensorboard]
0:                        [--no-log-learnig-rate-to-tensorboard]
0:                        [--no-log-loss-scale-to-tensorboard]
0:                        [--log-validation-ppl-to-tensorboard]
0:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
0:                        [--zero-contigious-gradients]
0:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
0:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
0:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
0:                        [--scattered-embeddings] [--split-transformers]
0:                        [--memory-centric-tiled-linear]
0:                        [--tile-factor TILE_FACTOR]
0:                        [--deepspeed-activation-checkpointing]
0:                        [--partition-activations] [--contigious-checkpointing]
0:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
0:                        [--profile-backward] [--deepspeed]
0:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
0:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
0: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
0: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
0:                        [--hidden-size HIDDEN_SIZE]
0:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
0:                        [--num-attention-heads NUM_ATTENTION_HEADS]
0:                        [--kv-channels KV_CHANNELS]
0:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
0:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
0:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
0:                        [--layernorm-epsilon LAYERNORM_EPSILON]
0:                        [--sync-tp-duplicated-parameters]
0:                        [--apply-residual-connection-post-layernorm]
0:                        [--embed-layernorm] [--openai-gelu]
0:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
0:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
0:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
0:                        [--kill-switch-path KILL_SWITCH_PATH]
0:                        [--log-level {debug,info,warning,error,critical}]
0:                        [--log-level-replica {debug,info,warning,error,critical}]
0:                        [--attention-dropout ATTENTION_DROPOUT]
0:                        [--hidden-dropout HIDDEN_DROPOUT]
0:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
0:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
0:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
0:                        [--micro-batch-size MICRO_BATCH_SIZE]
0:                        [--batch-size BATCH_SIZE]
0:                        [--global-batch-size GLOBAL_BATCH_SIZE]
0:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
0:                        [--checkpoint-activations]
0:                        [--distribute-checkpointed-activations]
0:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
0:                        [--train-iters TRAIN_ITERS]
0:                        [--train-samples TRAIN_SAMPLES]
0:                        [--train-tokens TRAIN_TOKENS]
0:                        [--log-interval LOG_INTERVAL]
0:                        [--exit-interval EXIT_INTERVAL]
0:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
0:                        [--tensorboard-dir TENSORBOARD_DIR]
0:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
0:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
0:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
0:                        [--use-bnb-optimizer]
0:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
0:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
0:                        [--eval-only EVAL_ONLY]
0:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
0:                        [--inference]
0:                        [--abort-on-unmet-fused-kernel-constraints]
0:                        [--pp-partition-method PP_PARTITION_METHOD]
0:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
0:                        [--init-method-xavier-uniform] [--lr LR]
0:                        [--lr-decay-style {constant,linear,cosine}]
0:                        [--lr-decay-iters LR_DECAY_ITERS]
0:                        [--lr-decay-samples LR_DECAY_SAMPLES]
0:                        [--lr-decay-tokens LR_DECAY_TOKENS]
0:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
0:                        [--lr-warmup-iters LR_WARMUP_ITERS]
0:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
0:                        [--warmup WARMUP] [--min-lr MIN_LR]
0:                        [--override-lr-scheduler]
0:                        [--use-checkpoint-lr-scheduler]
0:                        [--universal-checkpoint] [--save SAVE]
0:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
0:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
0:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
0:                        [--loss-scale LOSS_SCALE]
0:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
0:                        [--min-loss-scale MIN_LOSS_SCALE]
0:                        [--loss-scale-window LOSS_SCALE_WINDOW]
0:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
0:                        [--no-query-key-layer-scaling]
0:                        [--attention-softmax-in-fp32]
0:                        [--accumulate-allreduce-grads-in-fp32]
0:                        [--fp16-lm-cross-entropy]
0:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
0:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
0:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
0:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
0:                        [--distributed-backend {nccl,gloo}]
0:                        [--DDP-impl {local,torch}]
0:                        [--use-contiguous-buffers-in-ddp]
0:                        [--no-scatter-gather-tensors-in-pipeline]
0:                        [--local_rank LOCAL_RANK]
0:                        [--lazy-mpu-init LAZY_MPU_INIT]
0:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
0:                        [--eval-interval EVAL_INTERVAL]
0:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
0:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
0:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
0:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
0:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
0:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
0:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
0:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
0:                        [--merge-file MERGE_FILE]
0:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
0:                        [--seq-length SEQ_LENGTH]
0:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
0:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
0:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
0:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
0:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
0:                        [--num-workers NUM_WORKERS]
0:                        [--valid-num-workers VALID_NUM_WORKERS]
0:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
0:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
0:                        [--data-impl {lazy,cached,mmap,infer}]
0:                        [--reset-position-ids] [--reset-attention-mask]
0:                        [--eod-mask-loss] [--loss-on-targets-only]
0:                        [--reweight-loss-based-on-position-frequency]
0:                        [--noise-density NOISE_DENSITY]
0:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
0:                        [--adlr-autoresume]
0:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
0:                        [--ict-head-size ICT_HEAD_SIZE]
0:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
0:                        [--biencoder-shared-query-context-model]
0:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
0:                        [--titles-data-path TITLES_DATA_PATH]
0:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
0:                        [--use-one-sent-docs]
0:                        [--evidence-data-path EVIDENCE_DATA_PATH]
0:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
0:                        [--retriever-score-scaling]
0:                        [--block-data-path BLOCK_DATA_PATH]
0:                        [--embedding-path EMBEDDING_PATH]
0:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
0:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
0:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
0:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
0:                        [--log-params-norm] [--log-num-zeros-in-grad]
0:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
0:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
0:                        [--log-timers-to-tensorboard]
0:                        [--log-batch-size-to-tensorboard]
0:                        [--no-log-learnig-rate-to-tensorboard]
0:                        [--no-log-loss-scale-to-tensorboard]
0:                        [--log-validation-ppl-to-tensorboard]
0:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
0:                        [--zero-contigious-gradients]
0:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
0:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
0:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
0:                        [--scattered-embeddings] [--split-transformers]
0:                        [--memory-centric-tiled-linear]
0:                        [--tile-factor TILE_FACTOR]
0:                        [--deepspeed-activation-checkpointing]
0:                        [--partition-activations] [--contigious-checkpointing]
0:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
0:                        [--profile-backward] [--deepspeed]
0:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
0:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
0: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
0: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
0:                        [--hidden-size HIDDEN_SIZE]
0:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
0:                        [--num-attention-heads NUM_ATTENTION_HEADS]
0:                        [--kv-channels KV_CHANNELS]
0:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
0:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
0:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
0:                        [--layernorm-epsilon LAYERNORM_EPSILON]
0:                        [--sync-tp-duplicated-parameters]
0:                        [--apply-residual-connection-post-layernorm]
0:                        [--embed-layernorm] [--openai-gelu]
0:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
0:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
0:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
0:                        [--kill-switch-path KILL_SWITCH_PATH]
0:                        [--log-level {debug,info,warning,error,critical}]
0:                        [--log-level-replica {debug,info,warning,error,critical}]
0:                        [--attention-dropout ATTENTION_DROPOUT]
0:                        [--hidden-dropout HIDDEN_DROPOUT]
0:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
0:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
0:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
0:                        [--micro-batch-size MICRO_BATCH_SIZE]
0:                        [--batch-size BATCH_SIZE]
0:                        [--global-batch-size GLOBAL_BATCH_SIZE]
0:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
0:                        [--checkpoint-activations]
0:                        [--distribute-checkpointed-activations]
0:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
0:                        [--train-iters TRAIN_ITERS]
0:                        [--train-samples TRAIN_SAMPLES]
0:                        [--train-tokens TRAIN_TOKENS]
0:                        [--log-interval LOG_INTERVAL]
0:                        [--exit-interval EXIT_INTERVAL]
0:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
0:                        [--tensorboard-dir TENSORBOARD_DIR]
0:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
0:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
0:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
0:                        [--use-bnb-optimizer]
0:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
0:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
0:                        [--eval-only EVAL_ONLY]
0:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
0:                        [--inference]
0:                        [--abort-on-unmet-fused-kernel-constraints]
0:                        [--pp-partition-method PP_PARTITION_METHOD]
0:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
0:                        [--init-method-xavier-uniform] [--lr LR]
0:                        [--lr-decay-style {constant,linear,cosine}]
0:                        [--lr-decay-iters LR_DECAY_ITERS]
0:                        [--lr-decay-samples LR_DECAY_SAMPLES]
0:                        [--lr-decay-tokens LR_DECAY_TOKENS]
0:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
0:                        [--lr-warmup-iters LR_WARMUP_ITERS]
0:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
0:                        [--warmup WARMUP] [--min-lr MIN_LR]
0:                        [--override-lr-scheduler]
0:                        [--use-checkpoint-lr-scheduler]
0:                        [--universal-checkpoint] [--save SAVE]
0:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
0:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
0:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
0:                        [--loss-scale LOSS_SCALE]
0:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
0:                        [--min-loss-scale MIN_LOSS_SCALE]
0:                        [--loss-scale-window LOSS_SCALE_WINDOW]
0:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
0:                        [--no-query-key-layer-scaling]
0:                        [--attention-softmax-in-fp32]
0:                        [--accumulate-allreduce-grads-in-fp32]
0:                        [--fp16-lm-cross-entropy]
0:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
0:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
0:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
0:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
0:                        [--distributed-backend {nccl,gloo}]
0:                        [--DDP-impl {local,torch}]
0:                        [--use-contiguous-buffers-in-ddp]
0:                        [--no-scatter-gather-tensors-in-pipeline]
0:                        [--local_rank LOCAL_RANK]
0:                        [--lazy-mpu-init LAZY_MPU_INIT]
0:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
0:                        [--eval-interval EVAL_INTERVAL]
0:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
0:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
0:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
0:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
0:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
0:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
0:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
0:                        [--merge-file MERGE_FILE]
0:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
0:                        [--seq-length SEQ_LENGTH]
0:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
0:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
0:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
0:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
0:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
0:                        [--num-workers NUM_WORKERS]
0:                        [--valid-num-workers VALID_NUM_WORKERS]
0:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
0:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
0:                        [--data-impl {lazy,cached,mmap,infer}]
0:                        [--reset-position-ids] [--reset-attention-mask]
0:                        [--eod-mask-loss] [--loss-on-targets-only]
0:                        [--reweight-loss-based-on-position-frequency]
0:                        [--noise-density NOISE_DENSITY]
0:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
0:                        [--adlr-autoresume]
0:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
0:                        [--ict-head-size ICT_HEAD_SIZE]
0:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
0:                        [--biencoder-shared-query-context-model]
0:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
0:                        [--titles-data-path TITLES_DATA_PATH]
0:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
0:                        [--use-one-sent-docs]
0:                        [--evidence-data-path EVIDENCE_DATA_PATH]
0:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
0:                        [--retriever-score-scaling]
0:                        [--block-data-path BLOCK_DATA_PATH]
0:                        [--embedding-path EMBEDDING_PATH]
0:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
0:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
0:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
0:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
0:                        [--log-params-norm] [--log-num-zeros-in-grad]
0:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
0:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
0:                        [--log-timers-to-tensorboard]
0:                        [--log-batch-size-to-tensorboard]
0:                        [--no-log-learnig-rate-to-tensorboard]
0:                        [--no-log-loss-scale-to-tensorboard]
0:                        [--log-validation-ppl-to-tensorboard]
0:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
0:                        [--zero-contigious-gradients]
0:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
0:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
0:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
0:                        [--scattered-embeddings] [--split-transformers]
0:                        [--memory-centric-tiled-linear]
0:                        [--tile-factor TILE_FACTOR]
0:                        [--deepspeed-activation-checkpointing]
0:                        [--partition-activations] [--contigious-checkpointing]
0:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
0:                        [--profile-backward] [--deepspeed]
0:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
0:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
0: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
7:                        [--hidden-size HIDDEN_SIZE]
7:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
7:                        [--num-attention-heads NUM_ATTENTION_HEADS]
7:                        [--kv-channels KV_CHANNELS]
7:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
7:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
7:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
7:                        [--layernorm-epsilon LAYERNORM_EPSILON]
7:                        [--sync-tp-duplicated-parameters]
7:                        [--apply-residual-connection-post-layernorm]
7:                        [--embed-layernorm] [--openai-gelu]
7:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
7:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
7:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
7:                        [--kill-switch-path KILL_SWITCH_PATH]
7:                        [--log-level {debug,info,warning,error,critical}]
7:                        [--log-level-replica {debug,info,warning,error,critical}]
7:                        [--attention-dropout ATTENTION_DROPOUT]
7:                        [--hidden-dropout HIDDEN_DROPOUT]
7:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
7:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
7:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
7:                        [--micro-batch-size MICRO_BATCH_SIZE]
7:                        [--batch-size BATCH_SIZE]
7:                        [--global-batch-size GLOBAL_BATCH_SIZE]
7:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
7:                        [--checkpoint-activations]
7:                        [--distribute-checkpointed-activations]
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
7:                        [--train-iters TRAIN_ITERS]
7:                        [--train-samples TRAIN_SAMPLES]
7:                        [--train-tokens TRAIN_TOKENS]
7:                        [--log-interval LOG_INTERVAL]
7:                        [--exit-interval EXIT_INTERVAL]
7:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
7:                        [--tensorboard-dir TENSORBOARD_DIR]
7:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
7:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
7:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
7:                        [--use-bnb-optimizer]
7:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
7:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
7:                        [--eval-only EVAL_ONLY]
7:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
7:                        [--inference]
7:                        [--abort-on-unmet-fused-kernel-constraints]
7:                        [--pp-partition-method PP_PARTITION_METHOD]
7:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
7:                        [--init-method-xavier-uniform] [--lr LR]
7:                        [--lr-decay-style {constant,linear,cosine}]
7:                        [--lr-decay-iters LR_DECAY_ITERS]
7:                        [--lr-decay-samples LR_DECAY_SAMPLES]
7:                        [--lr-decay-tokens LR_DECAY_TOKENS]
7:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
7:                        [--lr-warmup-iters LR_WARMUP_ITERS]
7:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
7:                        [--warmup WARMUP] [--min-lr MIN_LR]
7:                        [--override-lr-scheduler]
7:                        [--use-checkpoint-lr-scheduler]
7:                        [--universal-checkpoint] [--save SAVE]
7:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
7:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
7:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
7:                        [--loss-scale LOSS_SCALE]
7:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
7:                        [--min-loss-scale MIN_LOSS_SCALE]
7:                        [--loss-scale-window LOSS_SCALE_WINDOW]
7:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
7:                        [--no-query-key-layer-scaling]
7:                        [--attention-softmax-in-fp32]
7:                        [--accumulate-allreduce-grads-in-fp32]
7:                        [--fp16-lm-cross-entropy]
7:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
7:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
7:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
7:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
7:                        [--distributed-backend {nccl,gloo}]
7:                        [--DDP-impl {local,torch}]
7:                        [--use-contiguous-buffers-in-ddp]
7:                        [--no-scatter-gather-tensors-in-pipeline]
7:                        [--local_rank LOCAL_RANK]
7:                        [--lazy-mpu-init LAZY_MPU_INIT]
7:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
7:                        [--eval-interval EVAL_INTERVAL]
7:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
7:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
7:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
7:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
7:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
7:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
7:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
7:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
7:                        [--merge-file MERGE_FILE]
7:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
7:                        [--seq-length SEQ_LENGTH]
7:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
7:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
7:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
7:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
7:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
7:                        [--num-workers NUM_WORKERS]
7:                        [--valid-num-workers VALID_NUM_WORKERS]
7:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
7:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
7:                        [--data-impl {lazy,cached,mmap,infer}]
7:                        [--reset-position-ids] [--reset-attention-mask]
7:                        [--eod-mask-loss] [--loss-on-targets-only]
7:                        [--reweight-loss-based-on-position-frequency]
7:                        [--noise-density NOISE_DENSITY]
7:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
7:                        [--adlr-autoresume]
7:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
7:                        [--ict-head-size ICT_HEAD_SIZE]
7:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
7:                        [--biencoder-shared-query-context-model]
7:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
7:                        [--titles-data-path TITLES_DATA_PATH]
7:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
7:                        [--use-one-sent-docs]
7:                        [--evidence-data-path EVIDENCE_DATA_PATH]
7:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
7:                        [--retriever-score-scaling]
7:                        [--block-data-path BLOCK_DATA_PATH]
7:                        [--embedding-path EMBEDDING_PATH]
7:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
7:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
7:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
7:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
7:                        [--log-params-norm] [--log-num-zeros-in-grad]
7:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
7:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
7:                        [--log-timers-to-tensorboard]
7:                        [--log-batch-size-to-tensorboard]
7:                        [--no-log-learnig-rate-to-tensorboard]
7:                        [--no-log-loss-scale-to-tensorboard]
7:                        [--log-validation-ppl-to-tensorboard]
7:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
7:                        [--zero-contigious-gradients]
7:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
7:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
7:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
7:                        [--scattered-embeddings] [--split-transformers]
7:                        [--memory-centric-tiled-linear]
7:                        [--tile-factor TILE_FACTOR]
7:                        [--deepspeed-activation-checkpointing]
7:                        [--partition-activations] [--contigious-checkpointing]
7:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
7:                        [--profile-backward] [--deepspeed]
7:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
7:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
7: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
7:                        [--hidden-size HIDDEN_SIZE]
7:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
7:                        [--num-attention-heads NUM_ATTENTION_HEADS]
7:                        [--kv-channels KV_CHANNELS]
7:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
7:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
7:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
7:                        [--layernorm-epsilon LAYERNORM_EPSILON]
7:                        [--sync-tp-duplicated-parameters]
7:                        [--apply-residual-connection-post-layernorm]
7:                        [--embed-layernorm] [--openai-gelu]
7:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
7:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
7:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
7:                        [--kill-switch-path KILL_SWITCH_PATH]
7:                        [--log-level {debug,info,warning,error,critical}]
7:                        [--log-level-replica {debug,info,warning,error,critical}]
7:                        [--attention-dropout ATTENTION_DROPOUT]
7:                        [--hidden-dropout HIDDEN_DROPOUT]
7:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
7:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
7:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
7:                        [--micro-batch-size MICRO_BATCH_SIZE]
7:                        [--batch-size BATCH_SIZE]
7:                        [--global-batch-size GLOBAL_BATCH_SIZE]
7:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
7:                        [--checkpoint-activations]
7:                        [--distribute-checkpointed-activations]
7:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
7:                        [--train-iters TRAIN_ITERS]
7:                        [--train-samples TRAIN_SAMPLES]
7:                        [--train-tokens TRAIN_TOKENS]
7:                        [--log-interval LOG_INTERVAL]
7:                        [--exit-interval EXIT_INTERVAL]
7:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
7:                        [--tensorboard-dir TENSORBOARD_DIR]
7:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
7:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
7:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
7:                        [--use-bnb-optimizer]
7:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
7:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
7:                        [--eval-only EVAL_ONLY]
7:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
7:                        [--inference]
7:                        [--abort-on-unmet-fused-kernel-constraints]
7:                        [--pp-partition-method PP_PARTITION_METHOD]
7:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
7:                        [--init-method-xavier-uniform] [--lr LR]
7:                        [--lr-decay-style {constant,linear,cosine}]
7:                        [--lr-decay-iters LR_DECAY_ITERS]
7:                        [--lr-decay-samples LR_DECAY_SAMPLES]
7:                        [--lr-decay-tokens LR_DECAY_TOKENS]
7:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
7:                        [--lr-warmup-iters LR_WARMUP_ITERS]
7:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
7:                        [--warmup WARMUP] [--min-lr MIN_LR]
7:                        [--override-lr-scheduler]
7:                        [--use-checkpoint-lr-scheduler]
7:                        [--universal-checkpoint] [--save SAVE]
7:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
7:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
7:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
7:                        [--loss-scale LOSS_SCALE]
7:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
7:                        [--min-loss-scale MIN_LOSS_SCALE]
7:                        [--loss-scale-window LOSS_SCALE_WINDOW]
7:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
7:                        [--no-query-key-layer-scaling]
7:                        [--attention-softmax-in-fp32]
7:                        [--accumulate-allreduce-grads-in-fp32]
7:                        [--fp16-lm-cross-entropy]
7:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
7:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
7:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
7:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
7:                        [--distributed-backend {nccl,gloo}]
7:                        [--DDP-impl {local,torch}]
7:                        [--use-contiguous-buffers-in-ddp]
7:                        [--no-scatter-gather-tensors-in-pipeline]
7:                        [--local_rank LOCAL_RANK]
7:                        [--lazy-mpu-init LAZY_MPU_INIT]
7:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
7:                        [--eval-interval EVAL_INTERVAL]
7:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
7:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
7:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
7:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
7:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
7:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
7:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
7:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
7:                        [--merge-file MERGE_FILE]
7:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
7:                        [--seq-length SEQ_LENGTH]
7:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
7:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
7:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
7:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
7:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
7:                        [--num-workers NUM_WORKERS]
7:                        [--valid-num-workers VALID_NUM_WORKERS]
7:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
7:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
7:                        [--data-impl {lazy,cached,mmap,infer}]
7:                        [--reset-position-ids] [--reset-attention-mask]
7:                        [--eod-mask-loss] [--loss-on-targets-only]
7:                        [--reweight-loss-based-on-position-frequency]
7:                        [--noise-density NOISE_DENSITY]
7:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
7:                        [--adlr-autoresume]
7:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
7:                        [--ict-head-size ICT_HEAD_SIZE]
7:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
7:                        [--biencoder-shared-query-context-model]
7:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
7:                        [--titles-data-path TITLES_DATA_PATH]
7:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
7:                        [--use-one-sent-docs]
7:                        [--evidence-data-path EVIDENCE_DATA_PATH]
7:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
7:                        [--retriever-score-scaling]
7:                        [--block-data-path BLOCK_DATA_PATH]
7:                        [--embedding-path EMBEDDING_PATH]
7:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
7:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
7:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
7:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
7:                        [--log-params-norm] [--log-num-zeros-in-grad]
7:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
7:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
7:                        [--log-timers-to-tensorboard]
7:                        [--log-batch-size-to-tensorboard]
7:                        [--no-log-learnig-rate-to-tensorboard]
7:                        [--no-log-loss-scale-to-tensorboard]
7:                        [--log-validation-ppl-to-tensorboard]
7:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
7:                        [--zero-contigious-gradients]
7:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
7:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
7:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
7:                        [--scattered-embeddings] [--split-transformers]
7:                        [--memory-centric-tiled-linear]
7:                        [--tile-factor TILE_FACTOR]
7:                        [--deepspeed-activation-checkpointing]
7:                        [--partition-activations] [--contigious-checkpointing]
7:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
7:                        [--profile-backward] [--deepspeed]
7:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
7:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
7: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
7: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
7: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
7:                        [--hidden-size HIDDEN_SIZE]
7:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
7:                        [--num-attention-heads NUM_ATTENTION_HEADS]
7:                        [--kv-channels KV_CHANNELS]
7:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
7:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
7:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
7:                        [--layernorm-epsilon LAYERNORM_EPSILON]
7:                        [--sync-tp-duplicated-parameters]
7:                        [--apply-residual-connection-post-layernorm]
7:                        [--embed-layernorm] [--openai-gelu]
7:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
7:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
7:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
7:                        [--kill-switch-path KILL_SWITCH_PATH]
7:                        [--log-level {debug,info,warning,error,critical}]
7:                        [--log-level-replica {debug,info,warning,error,critical}]
7:                        [--attention-dropout ATTENTION_DROPOUT]
7:                        [--hidden-dropout HIDDEN_DROPOUT]
7:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
7:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
7:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
7:                        [--micro-batch-size MICRO_BATCH_SIZE]
7:                        [--batch-size BATCH_SIZE]
7:                        [--global-batch-size GLOBAL_BATCH_SIZE]
7:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
7:                        [--checkpoint-activations]
7:                        [--distribute-checkpointed-activations]
7:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
7:                        [--train-iters TRAIN_ITERS]
7:                        [--train-samples TRAIN_SAMPLES]
7:                        [--train-tokens TRAIN_TOKENS]
7:                        [--log-interval LOG_INTERVAL]
7:                        [--exit-interval EXIT_INTERVAL]
7:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
7:                        [--tensorboard-dir TENSORBOARD_DIR]
7:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
7:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
7:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
7:                        [--use-bnb-optimizer]
7:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
7:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
7:                        [--eval-only EVAL_ONLY]
7:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
7:                        [--inference]
7:                        [--abort-on-unmet-fused-kernel-constraints]
7:                        [--pp-partition-method PP_PARTITION_METHOD]
7:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
7:                        [--init-method-xavier-uniform] [--lr LR]
7:                        [--lr-decay-style {constant,linear,cosine}]
7:                        [--lr-decay-iters LR_DECAY_ITERS]
7:                        [--lr-decay-samples LR_DECAY_SAMPLES]
7:                        [--lr-decay-tokens LR_DECAY_TOKENS]
7:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
7:                        [--lr-warmup-iters LR_WARMUP_ITERS]
7:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
7:                        [--warmup WARMUP] [--min-lr MIN_LR]
7:                        [--override-lr-scheduler]
7:                        [--use-checkpoint-lr-scheduler]
7:                        [--universal-checkpoint] [--save SAVE]
7:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
7:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
7:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
7:                        [--loss-scale LOSS_SCALE]
7:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
7:                        [--min-loss-scale MIN_LOSS_SCALE]
7:                        [--loss-scale-window LOSS_SCALE_WINDOW]
7:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
7:                        [--no-query-key-layer-scaling]
7:                        [--attention-softmax-in-fp32]
7:                        [--accumulate-allreduce-grads-in-fp32]
7:                        [--fp16-lm-cross-entropy]
7:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
7:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
7:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
7:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
7:                        [--distributed-backend {nccl,gloo}]
7:                        [--DDP-impl {local,torch}]
7:                        [--use-contiguous-buffers-in-ddp]
7:                        [--no-scatter-gather-tensors-in-pipeline]
7:                        [--local_rank LOCAL_RANK]
7:                        [--lazy-mpu-init LAZY_MPU_INIT]
7:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
7:                        [--eval-interval EVAL_INTERVAL]
7:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
7:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
7:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
7:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
7:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
7:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
7:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
7:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
7:                        [--merge-file MERGE_FILE]
7:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
7:                        [--seq-length SEQ_LENGTH]
7:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
7:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
7:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
7:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
7:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
7:                        [--num-workers NUM_WORKERS]
7:                        [--valid-num-workers VALID_NUM_WORKERS]
7:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
7:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
7:                        [--data-impl {lazy,cached,mmap,infer}]
7:                        [--reset-position-ids] [--reset-attention-mask]
7:                        [--eod-mask-loss] [--loss-on-targets-only]
7:                        [--reweight-loss-based-on-position-frequency]
7:                        [--noise-density NOISE_DENSITY]
7:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
7:                        [--adlr-autoresume]
7:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
7:                        [--ict-head-size ICT_HEAD_SIZE]
7:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
7:                        [--biencoder-shared-query-context-model]
7:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
7:                        [--titles-data-path TITLES_DATA_PATH]
7:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
7:                        [--use-one-sent-docs]
7:                        [--evidence-data-path EVIDENCE_DATA_PATH]
7:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
7:                        [--retriever-score-scaling]
7:                        [--block-data-path BLOCK_DATA_PATH]
7:                        [--embedding-path EMBEDDING_PATH]
7:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
7:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
7:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
7:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
7:                        [--log-params-norm] [--log-num-zeros-in-grad]
7:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
7:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
7:                        [--log-timers-to-tensorboard]
7:                        [--log-batch-size-to-tensorboard]
7:                        [--no-log-learnig-rate-to-tensorboard]
7:                        [--no-log-loss-scale-to-tensorboard]
7:                        [--log-validation-ppl-to-tensorboard]
7:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
7:                        [--zero-contigious-gradients]
7:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
7:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
7:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
7:                        [--scattered-embeddings] [--split-transformers]
7:                        [--memory-centric-tiled-linear]
7:                        [--tile-factor TILE_FACTOR]
7:                        [--deepspeed-activation-checkpointing]
7:                        [--partition-activations] [--contigious-checkpointing]
7:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
7:                        [--profile-backward] [--deepspeed]
7:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
7:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
7: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
7: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
7:                        [--hidden-size HIDDEN_SIZE]
7:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
7:                        [--num-attention-heads NUM_ATTENTION_HEADS]
7:                        [--kv-channels KV_CHANNELS]
7:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
7:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
7:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
7:                        [--layernorm-epsilon LAYERNORM_EPSILON]
7:                        [--sync-tp-duplicated-parameters]
7:                        [--apply-residual-connection-post-layernorm]
7:                        [--embed-layernorm] [--openai-gelu]
7:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
7:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
7:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
7:                        [--kill-switch-path KILL_SWITCH_PATH]
7:                        [--log-level {debug,info,warning,error,critical}]
7:                        [--log-level-replica {debug,info,warning,error,critical}]
7:                        [--attention-dropout ATTENTION_DROPOUT]
7:                        [--hidden-dropout HIDDEN_DROPOUT]
7:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
7:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
7:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
7:                        [--micro-batch-size MICRO_BATCH_SIZE]
7:                        [--batch-size BATCH_SIZE]
7:                        [--global-batch-size GLOBAL_BATCH_SIZE]
7:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
7:                        [--checkpoint-activations]
7:                        [--distribute-checkpointed-activations]
7:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
7:                        [--train-iters TRAIN_ITERS]
7:                        [--train-samples TRAIN_SAMPLES]
7:                        [--train-tokens TRAIN_TOKENS]
7:                        [--log-interval LOG_INTERVAL]
7:                        [--exit-interval EXIT_INTERVAL]
7:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
7:                        [--tensorboard-dir TENSORBOARD_DIR]
7:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
7:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
7:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
7:                        [--use-bnb-optimizer]
7:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
7:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
7:                        [--eval-only EVAL_ONLY]
7:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
7:                        [--inference]
7:                        [--abort-on-unmet-fused-kernel-constraints]
7:                        [--pp-partition-method PP_PARTITION_METHOD]
7:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
7:                        [--init-method-xavier-uniform] [--lr LR]
7:                        [--lr-decay-style {constant,linear,cosine}]
7:                        [--lr-decay-iters LR_DECAY_ITERS]
7:                        [--lr-decay-samples LR_DECAY_SAMPLES]
7:                        [--lr-decay-tokens LR_DECAY_TOKENS]
7:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
7:                        [--lr-warmup-iters LR_WARMUP_ITERS]
7:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
7:                        [--warmup WARMUP] [--min-lr MIN_LR]
7:                        [--override-lr-scheduler]
7:                        [--use-checkpoint-lr-scheduler]
7:                        [--universal-checkpoint] [--save SAVE]
7:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
7:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
7:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
7:                        [--loss-scale LOSS_SCALE]
7:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
7:                        [--min-loss-scale MIN_LOSS_SCALE]
7:                        [--loss-scale-window LOSS_SCALE_WINDOW]
7:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
7:                        [--no-query-key-layer-scaling]
7:                        [--attention-softmax-in-fp32]
7:                        [--accumulate-allreduce-grads-in-fp32]
7:                        [--fp16-lm-cross-entropy]
7:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
7:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
7:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
7:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
7:                        [--distributed-backend {nccl,gloo}]
7:                        [--DDP-impl {local,torch}]
7:                        [--use-contiguous-buffers-in-ddp]
7:                        [--no-scatter-gather-tensors-in-pipeline]
7:                        [--local_rank LOCAL_RANK]
7:                        [--lazy-mpu-init LAZY_MPU_INIT]
7:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
7:                        [--eval-interval EVAL_INTERVAL]
7:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
7:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
7:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
7:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
7:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
7:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
7:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
7:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
7:                        [--merge-file MERGE_FILE]
7:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
7:                        [--seq-length SEQ_LENGTH]
7:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
7:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
7:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
7:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
7:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
7:                        [--num-workers NUM_WORKERS]
7:                        [--valid-num-workers VALID_NUM_WORKERS]
7:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
7:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
7:                        [--data-impl {lazy,cached,mmap,infer}]
7:                        [--reset-position-ids] [--reset-attention-mask]
7:                        [--eod-mask-loss] [--loss-on-targets-only]
7:                        [--reweight-loss-based-on-position-frequency]
7:                        [--noise-density NOISE_DENSITY]
7:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
7:                        [--adlr-autoresume]
7:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
7:                        [--ict-head-size ICT_HEAD_SIZE]
7:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
7:                        [--biencoder-shared-query-context-model]
7:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
7:                        [--titles-data-path TITLES_DATA_PATH]
7:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
7:                        [--use-one-sent-docs]
7:                        [--evidence-data-path EVIDENCE_DATA_PATH]
7:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
7:                        [--retriever-score-scaling]
7:                        [--block-data-path BLOCK_DATA_PATH]
7:                        [--embedding-path EMBEDDING_PATH]
7:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
7:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
7:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
7:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
7:                        [--log-params-norm] [--log-num-zeros-in-grad]
7:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
7:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
7:                        [--log-timers-to-tensorboard]
7:                        [--log-batch-size-to-tensorboard]
7:                        [--no-log-learnig-rate-to-tensorboard]
7:                        [--no-log-loss-scale-to-tensorboard]
7:                        [--log-validation-ppl-to-tensorboard]
7:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
7:                        [--zero-contigious-gradients]
7:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
7:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
7:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
7:                        [--scattered-embeddings] [--split-transformers]
7:                        [--memory-centric-tiled-linear]
7:                        [--tile-factor TILE_FACTOR]
7:                        [--deepspeed-activation-checkpointing]
7:                        [--partition-activations] [--contigious-checkpointing]
7:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
7:                        [--profile-backward] [--deepspeed]
7:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
7:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
7: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
7: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
7:                        [--hidden-size HIDDEN_SIZE]
7:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
7:                        [--num-attention-heads NUM_ATTENTION_HEADS]
7:                        [--kv-channels KV_CHANNELS]
7:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
7:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
7:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
7:                        [--layernorm-epsilon LAYERNORM_EPSILON]
7:                        [--sync-tp-duplicated-parameters]
7:                        [--apply-residual-connection-post-layernorm]
7:                        [--embed-layernorm] [--openai-gelu]
7:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
7:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
7:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
7:                        [--kill-switch-path KILL_SWITCH_PATH]
7:                        [--log-level {debug,info,warning,error,critical}]
7:                        [--log-level-replica {debug,info,warning,error,critical}]
7:                        [--attention-dropout ATTENTION_DROPOUT]
7:                        [--hidden-dropout HIDDEN_DROPOUT]
7:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
7:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
7:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
7:                        [--micro-batch-size MICRO_BATCH_SIZE]
7:                        [--batch-size BATCH_SIZE]
7:                        [--global-batch-size GLOBAL_BATCH_SIZE]
7:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
7:                        [--checkpoint-activations]
7:                        [--distribute-checkpointed-activations]
7:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
7:                        [--train-iters TRAIN_ITERS]
7:                        [--train-samples TRAIN_SAMPLES]
7:                        [--train-tokens TRAIN_TOKENS]
7:                        [--log-interval LOG_INTERVAL]
7:                        [--exit-interval EXIT_INTERVAL]
7:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
7:                        [--tensorboard-dir TENSORBOARD_DIR]
7:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
7:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
7:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
7:                        [--use-bnb-optimizer]
7:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
7:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
7:                        [--eval-only EVAL_ONLY]
7:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
7:                        [--inference]
7:                        [--abort-on-unmet-fused-kernel-constraints]
7:                        [--pp-partition-method PP_PARTITION_METHOD]
7:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
7:                        [--init-method-xavier-uniform] [--lr LR]
7:                        [--lr-decay-style {constant,linear,cosine}]
7:                        [--lr-decay-iters LR_DECAY_ITERS]
7:                        [--lr-decay-samples LR_DECAY_SAMPLES]
7:                        [--lr-decay-tokens LR_DECAY_TOKENS]
7:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
7:                        [--lr-warmup-iters LR_WARMUP_ITERS]
7:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
7:                        [--warmup WARMUP] [--min-lr MIN_LR]
7:                        [--override-lr-scheduler]
7:                        [--use-checkpoint-lr-scheduler]
7:                        [--universal-checkpoint] [--save SAVE]
7:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
7:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
7:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
7:                        [--loss-scale LOSS_SCALE]
7:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
7:                        [--min-loss-scale MIN_LOSS_SCALE]
7:                        [--loss-scale-window LOSS_SCALE_WINDOW]
7:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
7:                        [--no-query-key-layer-scaling]
7:                        [--attention-softmax-in-fp32]
7:                        [--accumulate-allreduce-grads-in-fp32]
7:                        [--fp16-lm-cross-entropy]
7:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
7:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
7:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
7:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
7:                        [--distributed-backend {nccl,gloo}]
7:                        [--DDP-impl {local,torch}]
7:                        [--use-contiguous-buffers-in-ddp]
7:                        [--no-scatter-gather-tensors-in-pipeline]
7:                        [--local_rank LOCAL_RANK]
7:                        [--lazy-mpu-init LAZY_MPU_INIT]
7:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
7:                        [--eval-interval EVAL_INTERVAL]
7:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
7:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
7:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
7:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
7:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
7:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
7:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
7:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
7:                        [--merge-file MERGE_FILE]
7:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
7:                        [--seq-length SEQ_LENGTH]
7:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
7:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
7:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
7:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
7:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
7:                        [--num-workers NUM_WORKERS]
7:                        [--valid-num-workers VALID_NUM_WORKERS]
7:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
7:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
7:                        [--data-impl {lazy,cached,mmap,infer}]
7:                        [--reset-position-ids] [--reset-attention-mask]
7:                        [--eod-mask-loss] [--loss-on-targets-only]
7:                        [--reweight-loss-based-on-position-frequency]
7:                        [--noise-density NOISE_DENSITY]
7:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
7:                        [--adlr-autoresume]
7:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
7:                        [--ict-head-size ICT_HEAD_SIZE]
7:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
7:                        [--biencoder-shared-query-context-model]
7:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
7:                        [--titles-data-path TITLES_DATA_PATH]
7:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
7:                        [--use-one-sent-docs]
7:                        [--evidence-data-path EVIDENCE_DATA_PATH]
7:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
7:                        [--retriever-score-scaling]
7:                        [--block-data-path BLOCK_DATA_PATH]
7:                        [--embedding-path EMBEDDING_PATH]
7:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
7:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
7:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
7:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
7:                        [--log-params-norm] [--log-num-zeros-in-grad]
7:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
7:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
7:                        [--log-timers-to-tensorboard]
7:                        [--log-batch-size-to-tensorboard]
7:                        [--no-log-learnig-rate-to-tensorboard]
7:                        [--no-log-loss-scale-to-tensorboard]
7:                        [--log-validation-ppl-to-tensorboard]
7:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
7:                        [--zero-contigious-gradients]
7:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
7:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
7:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
7:                        [--scattered-embeddings] [--split-transformers]
7:                        [--memory-centric-tiled-linear]
7:                        [--tile-factor TILE_FACTOR]
7:                        [--deepspeed-activation-checkpointing]
7:                        [--partition-activations] [--contigious-checkpointing]
7:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
7:                        [--profile-backward] [--deepspeed]
7:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
7:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
7: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
0: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
0:                        [--hidden-size HIDDEN_SIZE]
0:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
0:                        [--num-attention-heads NUM_ATTENTION_HEADS]
0:                        [--kv-channels KV_CHANNELS]
0:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
0:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
0:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
0:                        [--layernorm-epsilon LAYERNORM_EPSILON]
0:                        [--sync-tp-duplicated-parameters]
0:                        [--apply-residual-connection-post-layernorm]
0:                        [--embed-layernorm] [--openai-gelu]
0:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
0:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
0:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
0:                        [--kill-switch-path KILL_SWITCH_PATH]
0:                        [--log-level {debug,info,warning,error,critical}]
0:                        [--log-level-replica {debug,info,warning,error,critical}]
0:                        [--attention-dropout ATTENTION_DROPOUT]
0:                        [--hidden-dropout HIDDEN_DROPOUT]
0:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
0:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
0:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
0:                        [--micro-batch-size MICRO_BATCH_SIZE]
0:                        [--batch-size BATCH_SIZE]
0:                        [--global-batch-size GLOBAL_BATCH_SIZE]
0:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
0:                        [--checkpoint-activations]
0:                        [--distribute-checkpointed-activations]
0:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
0:                        [--train-iters TRAIN_ITERS]
0:                        [--train-samples TRAIN_SAMPLES]
0:                        [--train-tokens TRAIN_TOKENS]
0:                        [--log-interval LOG_INTERVAL]
0:                        [--exit-interval EXIT_INTERVAL]
0:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
0:                        [--tensorboard-dir TENSORBOARD_DIR]
0:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
0:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
0:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
0:                        [--use-bnb-optimizer]
0:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
0:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
0:                        [--eval-only EVAL_ONLY]
0:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
0:                        [--inference]
0:                        [--abort-on-unmet-fused-kernel-constraints]
0:                        [--pp-partition-method PP_PARTITION_METHOD]
0:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
0:                        [--init-method-xavier-uniform] [--lr LR]
0:                        [--lr-decay-style {constant,linear,cosine}]
0:                        [--lr-decay-iters LR_DECAY_ITERS]
0:                        [--lr-decay-samples LR_DECAY_SAMPLES]
0:                        [--lr-decay-tokens LR_DECAY_TOKENS]
0:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
0:                        [--lr-warmup-iters LR_WARMUP_ITERS]
0:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
0:                        [--warmup WARMUP] [--min-lr MIN_LR]
0:                        [--override-lr-scheduler]
0:                        [--use-checkpoint-lr-scheduler]
0:                        [--universal-checkpoint] [--save SAVE]
0:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
0:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
0:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
0:                        [--loss-scale LOSS_SCALE]
0:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
0:                        [--min-loss-scale MIN_LOSS_SCALE]
0:                        [--loss-scale-window LOSS_SCALE_WINDOW]
0:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
0:                        [--no-query-key-layer-scaling]
0:                        [--attention-softmax-in-fp32]
0:                        [--accumulate-allreduce-grads-in-fp32]
0:                        [--fp16-lm-cross-entropy]
0:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
0:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
0:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
0:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
0:                        [--distributed-backend {nccl,gloo}]
0:                        [--DDP-impl {local,torch}]
0:                        [--use-contiguous-buffers-in-ddp]
0:                        [--no-scatter-gather-tensors-in-pipeline]
0:                        [--local_rank LOCAL_RANK]
0:                        [--lazy-mpu-init LAZY_MPU_INIT]
0:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
0:                        [--eval-interval EVAL_INTERVAL]
0:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
0:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
0:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
0:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
0:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
0:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
0:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
0:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
0:                        [--merge-file MERGE_FILE]
0:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
0:                        [--seq-length SEQ_LENGTH]
0:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
0:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
0:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
0:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
0:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
0:                        [--num-workers NUM_WORKERS]
0:                        [--valid-num-workers VALID_NUM_WORKERS]
0:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
0:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
0:                        [--data-impl {lazy,cached,mmap,infer}]
0:                        [--reset-position-ids] [--reset-attention-mask]
0:                        [--eod-mask-loss] [--loss-on-targets-only]
0:                        [--reweight-loss-based-on-position-frequency]
0:                        [--noise-density NOISE_DENSITY]
0:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
0:                        [--adlr-autoresume]
0:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
0:                        [--ict-head-size ICT_HEAD_SIZE]
0:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
0:                        [--biencoder-shared-query-context-model]
0:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
0:                        [--titles-data-path TITLES_DATA_PATH]
0:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
0:                        [--use-one-sent-docs]
0:                        [--evidence-data-path EVIDENCE_DATA_PATH]
0:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
0:                        [--retriever-score-scaling]
0:                        [--block-data-path BLOCK_DATA_PATH]
0:                        [--embedding-path EMBEDDING_PATH]
0:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
0:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
0:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
0:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
0:                        [--log-params-norm] [--log-num-zeros-in-grad]
0:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
0:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
0:                        [--log-timers-to-tensorboard]
0:                        [--log-batch-size-to-tensorboard]
0:                        [--no-log-learnig-rate-to-tensorboard]
0:                        [--no-log-loss-scale-to-tensorboard]
0:                        [--log-validation-ppl-to-tensorboard]
0:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
0:                        [--zero-contigious-gradients]
0:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
0:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
0:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
0:                        [--scattered-embeddings] [--split-transformers]
0:                        [--memory-centric-tiled-linear]
0:                        [--tile-factor TILE_FACTOR]
0:                        [--deepspeed-activation-checkpointing]
0:                        [--partition-activations] [--contigious-checkpointing]
0:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
0:                        [--profile-backward] [--deepspeed]
0:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
0:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
0: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
1: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
1:                        [--hidden-size HIDDEN_SIZE]
1:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
1:                        [--num-attention-heads NUM_ATTENTION_HEADS]
1:                        [--kv-channels KV_CHANNELS]
1:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
1:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
1:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
1:                        [--layernorm-epsilon LAYERNORM_EPSILON]
1:                        [--sync-tp-duplicated-parameters]
1:                        [--apply-residual-connection-post-layernorm]
1:                        [--embed-layernorm] [--openai-gelu]
1:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
1:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
1:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
1:                        [--kill-switch-path KILL_SWITCH_PATH]
1:                        [--log-level {debug,info,warning,error,critical}]
1:                        [--log-level-replica {debug,info,warning,error,critical}]
1:                        [--attention-dropout ATTENTION_DROPOUT]
1:                        [--hidden-dropout HIDDEN_DROPOUT]
1:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
1:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
1:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
1:                        [--micro-batch-size MICRO_BATCH_SIZE]
1:                        [--batch-size BATCH_SIZE]
1:                        [--global-batch-size GLOBAL_BATCH_SIZE]
1:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
1:                        [--checkpoint-activations]
1:                        [--distribute-checkpointed-activations]
1:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
1:                        [--train-iters TRAIN_ITERS]
1:                        [--train-samples TRAIN_SAMPLES]
1:                        [--train-tokens TRAIN_TOKENS]
1:                        [--log-interval LOG_INTERVAL]
1:                        [--exit-interval EXIT_INTERVAL]
1:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
1:                        [--tensorboard-dir TENSORBOARD_DIR]
1:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
1:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
1:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
1:                        [--use-bnb-optimizer]
1:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
1:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
1:                        [--eval-only EVAL_ONLY]
1:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
1:                        [--inference]
1:                        [--abort-on-unmet-fused-kernel-constraints]
1:                        [--pp-partition-method PP_PARTITION_METHOD]
1:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
1:                        [--init-method-xavier-uniform] [--lr LR]
1:                        [--lr-decay-style {constant,linear,cosine}]
1:                        [--lr-decay-iters LR_DECAY_ITERS]
1:                        [--lr-decay-samples LR_DECAY_SAMPLES]
1:                        [--lr-decay-tokens LR_DECAY_TOKENS]
1:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
1:                        [--lr-warmup-iters LR_WARMUP_ITERS]
1:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
1:                        [--warmup WARMUP] [--min-lr MIN_LR]
1:                        [--override-lr-scheduler]
1:                        [--use-checkpoint-lr-scheduler]
1:                        [--universal-checkpoint] [--save SAVE]
1:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
1:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
1:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
1:                        [--loss-scale LOSS_SCALE]
1:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
1:                        [--min-loss-scale MIN_LOSS_SCALE]
1:                        [--loss-scale-window LOSS_SCALE_WINDOW]
1:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
1:                        [--no-query-key-layer-scaling]
1:                        [--attention-softmax-in-fp32]
1:                        [--accumulate-allreduce-grads-in-fp32]
1:                        [--fp16-lm-cross-entropy]
1:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
1:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
1:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
1:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
1:                        [--distributed-backend {nccl,gloo}]
1:                        [--DDP-impl {local,torch}]
1:                        [--use-contiguous-buffers-in-ddp]
1:                        [--no-scatter-gather-tensors-in-pipeline]
1:                        [--local_rank LOCAL_RANK]
1:                        [--lazy-mpu-init LAZY_MPU_INIT]
1:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
1:                        [--eval-interval EVAL_INTERVAL]
1:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
1:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
1:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
1:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
1:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
1:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
1:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
1:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
1:                        [--merge-file MERGE_FILE]
1:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
1:                        [--seq-length SEQ_LENGTH]
1:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
1:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
1:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
1:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
1:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
1:                        [--num-workers NUM_WORKERS]
1:                        [--valid-num-workers VALID_NUM_WORKERS]
1:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
1:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
1:                        [--data-impl {lazy,cached,mmap,infer}]
1:                        [--reset-position-ids] [--reset-attention-mask]
1:                        [--eod-mask-loss] [--loss-on-targets-only]
1:                        [--reweight-loss-based-on-position-frequency]
1:                        [--noise-density NOISE_DENSITY]
1:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
1:                        [--adlr-autoresume]
1:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
1:                        [--ict-head-size ICT_HEAD_SIZE]
1:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
1:                        [--biencoder-shared-query-context-model]
1:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
1:                        [--titles-data-path TITLES_DATA_PATH]
1:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
1:                        [--use-one-sent-docs]
1:                        [--evidence-data-path EVIDENCE_DATA_PATH]
1:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
1:                        [--retriever-score-scaling]
1:                        [--block-data-path BLOCK_DATA_PATH]
1:                        [--embedding-path EMBEDDING_PATH]
1:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
1:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
1:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
1:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
1:                        [--log-params-norm] [--log-num-zeros-in-grad]
1:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
1:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
1:                        [--log-timers-to-tensorboard]
1:                        [--log-batch-size-to-tensorboard]
1:                        [--no-log-learnig-rate-to-tensorboard]
1:                        [--no-log-loss-scale-to-tensorboard]
1:                        [--log-validation-ppl-to-tensorboard]
1:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
1:                        [--zero-contigious-gradients]
1:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
1:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
1:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
1:                        [--scattered-embeddings] [--split-transformers]
1:                        [--memory-centric-tiled-linear]
1:                        [--tile-factor TILE_FACTOR]
1:                        [--deepspeed-activation-checkpointing]
1:                        [--partition-activations] [--contigious-checkpointing]
1:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
1:                        [--profile-backward] [--deepspeed]
1:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
1:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
1: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
1:                        [--hidden-size HIDDEN_SIZE]
1:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
1:                        [--num-attention-heads NUM_ATTENTION_HEADS]
1:                        [--kv-channels KV_CHANNELS]
1:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
1:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
1:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
1:                        [--layernorm-epsilon LAYERNORM_EPSILON]
1:                        [--sync-tp-duplicated-parameters]
1:                        [--apply-residual-connection-post-layernorm]
1:                        [--embed-layernorm] [--openai-gelu]
1:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
1:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
1:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
1:                        [--kill-switch-path KILL_SWITCH_PATH]
1:                        [--log-level {debug,info,warning,error,critical}]
1:                        [--log-level-replica {debug,info,warning,error,critical}]
1:                        [--attention-dropout ATTENTION_DROPOUT]
1:                        [--hidden-dropout HIDDEN_DROPOUT]
1:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
1:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
1:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
1:                        [--micro-batch-size MICRO_BATCH_SIZE]
1:                        [--batch-size BATCH_SIZE]
1:                        [--global-batch-size GLOBAL_BATCH_SIZE]
1:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
1:                        [--checkpoint-activations]
1:                        [--distribute-checkpointed-activations]
1:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
1:                        [--train-iters TRAIN_ITERS]
1:                        [--train-samples TRAIN_SAMPLES]
1:                        [--train-tokens TRAIN_TOKENS]
1:                        [--log-interval LOG_INTERVAL]
1:                        [--exit-interval EXIT_INTERVAL]
1:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
1:                        [--tensorboard-dir TENSORBOARD_DIR]
1:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
1:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
1:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
1:                        [--use-bnb-optimizer]
1:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
1:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
1:                        [--eval-only EVAL_ONLY]
1:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
1:                        [--inference]
1:                        [--abort-on-unmet-fused-kernel-constraints]
1:                        [--pp-partition-method PP_PARTITION_METHOD]
1:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
1:                        [--init-method-xavier-uniform] [--lr LR]
1:                        [--lr-decay-style {constant,linear,cosine}]
1:                        [--lr-decay-iters LR_DECAY_ITERS]
1:                        [--lr-decay-samples LR_DECAY_SAMPLES]
1:                        [--lr-decay-tokens LR_DECAY_TOKENS]
1:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
1:                        [--lr-warmup-iters LR_WARMUP_ITERS]
1:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
1:                        [--warmup WARMUP] [--min-lr MIN_LR]
1:                        [--override-lr-scheduler]
1:                        [--use-checkpoint-lr-scheduler]
1:                        [--universal-checkpoint] [--save SAVE]
1:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
1:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
1:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
1:                        [--loss-scale LOSS_SCALE]
1:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
1:                        [--min-loss-scale MIN_LOSS_SCALE]
1:                        [--loss-scale-window LOSS_SCALE_WINDOW]
1:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
1:                        [--no-query-key-layer-scaling]
1:                        [--attention-softmax-in-fp32]
1:                        [--accumulate-allreduce-grads-in-fp32]
1:                        [--fp16-lm-cross-entropy]
1:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
1:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
1:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
1:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
1:                        [--distributed-backend {nccl,gloo}]
1:                        [--DDP-impl {local,torch}]
1:                        [--use-contiguous-buffers-in-ddp]
1:                        [--no-scatter-gather-tensors-in-pipeline]
1:                        [--local_rank LOCAL_RANK]
1:                        [--lazy-mpu-init LAZY_MPU_INIT]
1:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
1:                        [--eval-interval EVAL_INTERVAL]
1:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
1:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
1:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
1:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
1:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
1:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
1:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
1:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
1:                        [--merge-file MERGE_FILE]
1:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
1:                        [--seq-length SEQ_LENGTH]
1:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
1:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
1:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
1:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
1:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
1:                        [--num-workers NUM_WORKERS]
1:                        [--valid-num-workers VALID_NUM_WORKERS]
1:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
1:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
1:                        [--data-impl {lazy,cached,mmap,infer}]
1:                        [--reset-position-ids] [--reset-attention-mask]
1:                        [--eod-mask-loss] [--loss-on-targets-only]
1:                        [--reweight-loss-based-on-position-frequency]
1:                        [--noise-density NOISE_DENSITY]
1:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
1:                        [--adlr-autoresume]
1:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
1:                        [--ict-head-size ICT_HEAD_SIZE]
1:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
1:                        [--biencoder-shared-query-context-model]
1:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
1:                        [--titles-data-path TITLES_DATA_PATH]
1:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
1:                        [--use-one-sent-docs]
1:                        [--evidence-data-path EVIDENCE_DATA_PATH]
1:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
1:                        [--retriever-score-scaling]
1:                        [--block-data-path BLOCK_DATA_PATH]
1:                        [--embedding-path EMBEDDING_PATH]
1:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
1:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
1:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
1:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
1:                        [--log-params-norm] [--log-num-zeros-in-grad]
1:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
1:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
1:                        [--log-timers-to-tensorboard]
1:                        [--log-batch-size-to-tensorboard]
1:                        [--no-log-learnig-rate-to-tensorboard]
1:                        [--no-log-loss-scale-to-tensorboard]
1:                        [--log-validation-ppl-to-tensorboard]
1:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
1:                        [--zero-contigious-gradients]
1:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
1:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
1:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
1:                        [--scattered-embeddings] [--split-transformers]
1:                        [--memory-centric-tiled-linear]
1:                        [--tile-factor TILE_FACTOR]
1:                        [--deepspeed-activation-checkpointing]
1:                        [--partition-activations] [--contigious-checkpointing]
1:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
1:                        [--profile-backward] [--deepspeed]
1:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
1:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
1: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
1: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
1: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
1:                        [--hidden-size HIDDEN_SIZE]
1:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
1:                        [--num-attention-heads NUM_ATTENTION_HEADS]
1:                        [--kv-channels KV_CHANNELS]
1:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
1:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
1:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
1:                        [--layernorm-epsilon LAYERNORM_EPSILON]
1:                        [--sync-tp-duplicated-parameters]
1:                        [--apply-residual-connection-post-layernorm]
1:                        [--embed-layernorm] [--openai-gelu]
1:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
1:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
1:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
1:                        [--kill-switch-path KILL_SWITCH_PATH]
1:                        [--log-level {debug,info,warning,error,critical}]
1:                        [--log-level-replica {debug,info,warning,error,critical}]
1:                        [--attention-dropout ATTENTION_DROPOUT]
1:                        [--hidden-dropout HIDDEN_DROPOUT]
1:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
1:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
1:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
1:                        [--micro-batch-size MICRO_BATCH_SIZE]
1:                        [--batch-size BATCH_SIZE]
1:                        [--global-batch-size GLOBAL_BATCH_SIZE]
1:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
1:                        [--checkpoint-activations]
1:                        [--distribute-checkpointed-activations]
1:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
1:                        [--train-iters TRAIN_ITERS]
1:                        [--train-samples TRAIN_SAMPLES]
1:                        [--train-tokens TRAIN_TOKENS]
1:                        [--log-interval LOG_INTERVAL]
1:                        [--exit-interval EXIT_INTERVAL]
1:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
1:                        [--tensorboard-dir TENSORBOARD_DIR]
1:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
1:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
1:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
1:                        [--use-bnb-optimizer]
1:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
1:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
1:                        [--eval-only EVAL_ONLY]
1:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
1:                        [--inference]
1:                        [--abort-on-unmet-fused-kernel-constraints]
1:                        [--pp-partition-method PP_PARTITION_METHOD]
1:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
1:                        [--init-method-xavier-uniform] [--lr LR]
1:                        [--lr-decay-style {constant,linear,cosine}]
1:                        [--lr-decay-iters LR_DECAY_ITERS]
1:                        [--lr-decay-samples LR_DECAY_SAMPLES]
1:                        [--lr-decay-tokens LR_DECAY_TOKENS]
1:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
1:                        [--lr-warmup-iters LR_WARMUP_ITERS]
1:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
1:                        [--warmup WARMUP] [--min-lr MIN_LR]
1:                        [--override-lr-scheduler]
1:                        [--use-checkpoint-lr-scheduler]
1:                        [--universal-checkpoint] [--save SAVE]
1:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
1:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
1:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
1:                        [--loss-scale LOSS_SCALE]
1:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
1:                        [--min-loss-scale MIN_LOSS_SCALE]
1:                        [--loss-scale-window LOSS_SCALE_WINDOW]
1:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
1:                        [--no-query-key-layer-scaling]
1:                        [--attention-softmax-in-fp32]
1:                        [--accumulate-allreduce-grads-in-fp32]
1:                        [--fp16-lm-cross-entropy]
1:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
1:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
1:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
1:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
1:                        [--distributed-backend {nccl,gloo}]
1:                        [--DDP-impl {local,torch}]
1:                        [--use-contiguous-buffers-in-ddp]
1:                        [--no-scatter-gather-tensors-in-pipeline]
1:                        [--local_rank LOCAL_RANK]
1:                        [--lazy-mpu-init LAZY_MPU_INIT]
1:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
1:                        [--eval-interval EVAL_INTERVAL]
1:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
1:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
1:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
1:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
1:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
1:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
1:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
1:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
1:                        [--merge-file MERGE_FILE]
1:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
1:                        [--seq-length SEQ_LENGTH]
1:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
1:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
1:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
1:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
1:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
1:                        [--num-workers NUM_WORKERS]
1:                        [--valid-num-workers VALID_NUM_WORKERS]
1:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
1:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
1:                        [--data-impl {lazy,cached,mmap,infer}]
1:                        [--reset-position-ids] [--reset-attention-mask]
1:                        [--eod-mask-loss] [--loss-on-targets-only]
1:                        [--reweight-loss-based-on-position-frequency]
1:                        [--noise-density NOISE_DENSITY]
1:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
1:                        [--adlr-autoresume]
1:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
1:                        [--ict-head-size ICT_HEAD_SIZE]
1:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
1:                        [--biencoder-shared-query-context-model]
1:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
1:                        [--titles-data-path TITLES_DATA_PATH]
1:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
1:                        [--use-one-sent-docs]
1:                        [--evidence-data-path EVIDENCE_DATA_PATH]
1:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
1:                        [--retriever-score-scaling]
1:                        [--block-data-path BLOCK_DATA_PATH]
1:                        [--embedding-path EMBEDDING_PATH]
1:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
1:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
1:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
1:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
1:                        [--log-params-norm] [--log-num-zeros-in-grad]
1:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
1:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
1:                        [--log-timers-to-tensorboard]
1:                        [--log-batch-size-to-tensorboard]
1:                        [--no-log-learnig-rate-to-tensorboard]
1:                        [--no-log-loss-scale-to-tensorboard]
1:                        [--log-validation-ppl-to-tensorboard]
1:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
1:                        [--zero-contigious-gradients]
1:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
1:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
1:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
1:                        [--scattered-embeddings] [--split-transformers]
1:                        [--memory-centric-tiled-linear]
1:                        [--tile-factor TILE_FACTOR]
1:                        [--deepspeed-activation-checkpointing]
1:                        [--partition-activations] [--contigious-checkpointing]
1:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
1:                        [--profile-backward] [--deepspeed]
1:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
1:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
1: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
5:                        [--hidden-size HIDDEN_SIZE]
5:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
5:                        [--num-attention-heads NUM_ATTENTION_HEADS]
5:                        [--kv-channels KV_CHANNELS]
5:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
5:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
5:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
5:                        [--layernorm-epsilon LAYERNORM_EPSILON]
5:                        [--sync-tp-duplicated-parameters]
5:                        [--apply-residual-connection-post-layernorm]
5:                        [--embed-layernorm] [--openai-gelu]
5:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
5:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
5:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
5:                        [--kill-switch-path KILL_SWITCH_PATH]
5:                        [--log-level {debug,info,warning,error,critical}]
5:                        [--log-level-replica {debug,info,warning,error,critical}]
5:                        [--attention-dropout ATTENTION_DROPOUT]
5:                        [--hidden-dropout HIDDEN_DROPOUT]
5:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
5:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
5:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
5:                        [--micro-batch-size MICRO_BATCH_SIZE]
5:                        [--batch-size BATCH_SIZE]
5:                        [--global-batch-size GLOBAL_BATCH_SIZE]
5:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
5:                        [--checkpoint-activations]
5:                        [--distribute-checkpointed-activations]
5:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
5:                        [--train-iters TRAIN_ITERS]
5:                        [--train-samples TRAIN_SAMPLES]
5:                        [--train-tokens TRAIN_TOKENS]
5:                        [--log-interval LOG_INTERVAL]
5:                        [--exit-interval EXIT_INTERVAL]
5:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
5:                        [--tensorboard-dir TENSORBOARD_DIR]
5:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
5:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
5:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
5:                        [--use-bnb-optimizer]
5:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
5:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
5:                        [--eval-only EVAL_ONLY]
5:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
5:                        [--inference]
5:                        [--abort-on-unmet-fused-kernel-constraints]
5:                        [--pp-partition-method PP_PARTITION_METHOD]
5:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
5:                        [--init-method-xavier-uniform] [--lr LR]
5:                        [--lr-decay-style {constant,linear,cosine}]
5:                        [--lr-decay-iters LR_DECAY_ITERS]
5:                        [--lr-decay-samples LR_DECAY_SAMPLES]
5:                        [--lr-decay-tokens LR_DECAY_TOKENS]
5:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
5:                        [--lr-warmup-iters LR_WARMUP_ITERS]
5:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
5:                        [--warmup WARMUP] [--min-lr MIN_LR]
5:                        [--override-lr-scheduler]
5:                        [--use-checkpoint-lr-scheduler]
5:                        [--universal-checkpoint] [--save SAVE]
5:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
2: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
2:                        [--hidden-size HIDDEN_SIZE]
2:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
2:                        [--num-attention-heads NUM_ATTENTION_HEADS]
2:                        [--kv-channels KV_CHANNELS]
2:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
2:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
2:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
2:                        [--layernorm-epsilon LAYERNORM_EPSILON]
2:                        [--sync-tp-duplicated-parameters]
2:                        [--apply-residual-connection-post-layernorm]
2:                        [--embed-layernorm] [--openai-gelu]
2:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
2:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
5:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
5:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
5:                        [--loss-scale LOSS_SCALE]
5:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
5:                        [--min-loss-scale MIN_LOSS_SCALE]
5:                        [--loss-scale-window LOSS_SCALE_WINDOW]
5:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
5:                        [--no-query-key-layer-scaling]
5:                        [--attention-softmax-in-fp32]
5:                        [--accumulate-allreduce-grads-in-fp32]
5:                        [--fp16-lm-cross-entropy]
5:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
5:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
5:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
5:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
2:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
2:                        [--kill-switch-path KILL_SWITCH_PATH]
2:                        [--log-level {debug,info,warning,error,critical}]
2:                        [--log-level-replica {debug,info,warning,error,critical}]
2:                        [--attention-dropout ATTENTION_DROPOUT]
2:                        [--hidden-dropout HIDDEN_DROPOUT]
2:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
2:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
2:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
2:                        [--micro-batch-size MICRO_BATCH_SIZE]
2:                        [--batch-size BATCH_SIZE]
2:                        [--global-batch-size GLOBAL_BATCH_SIZE]
2:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
2:                        [--checkpoint-activations]
2:                        [--distribute-checkpointed-activations]
5:                        [--distributed-backend {nccl,gloo}]
5:                        [--DDP-impl {local,torch}]
5:                        [--use-contiguous-buffers-in-ddp]
5:                        [--no-scatter-gather-tensors-in-pipeline]
5:                        [--local_rank LOCAL_RANK]
5:                        [--lazy-mpu-init LAZY_MPU_INIT]
5:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
5:                        [--eval-interval EVAL_INTERVAL]
5:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
5:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
5:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
5:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
5:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
5:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
2:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
2:                        [--train-iters TRAIN_ITERS]
2:                        [--train-samples TRAIN_SAMPLES]
2:                        [--train-tokens TRAIN_TOKENS]
2:                        [--log-interval LOG_INTERVAL]
2:                        [--exit-interval EXIT_INTERVAL]
2:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
2:                        [--tensorboard-dir TENSORBOARD_DIR]
2:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
2:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
2:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
2:                        [--use-bnb-optimizer]
2:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
2:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
2:                        [--eval-only EVAL_ONLY]
5:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
5:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
5:                        [--merge-file MERGE_FILE]
5:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
5:                        [--seq-length SEQ_LENGTH]
5:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
5:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
5:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
5:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
5:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
5:                        [--num-workers NUM_WORKERS]
5:                        [--valid-num-workers VALID_NUM_WORKERS]
5:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
5:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
2:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
2:                        [--inference]
2:                        [--abort-on-unmet-fused-kernel-constraints]
2:                        [--pp-partition-method PP_PARTITION_METHOD]
2:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
2:                        [--init-method-xavier-uniform] [--lr LR]
2:                        [--lr-decay-style {constant,linear,cosine}]
2:                        [--lr-decay-iters LR_DECAY_ITERS]
2:                        [--lr-decay-samples LR_DECAY_SAMPLES]
2:                        [--lr-decay-tokens LR_DECAY_TOKENS]
2:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
2:                        [--lr-warmup-iters LR_WARMUP_ITERS]
2:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
2:                        [--warmup WARMUP] [--min-lr MIN_LR]
2:                        [--override-lr-scheduler]
2:                        [--use-checkpoint-lr-scheduler]
5:                        [--data-impl {lazy,cached,mmap,infer}]
5:                        [--reset-position-ids] [--reset-attention-mask]
5:                        [--eod-mask-loss] [--loss-on-targets-only]
5:                        [--reweight-loss-based-on-position-frequency]
5:                        [--noise-density NOISE_DENSITY]
5:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
5:                        [--adlr-autoresume]
5:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
5:                        [--ict-head-size ICT_HEAD_SIZE]
5:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
5:                        [--biencoder-shared-query-context-model]
5:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
5:                        [--titles-data-path TITLES_DATA_PATH]
5:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
5:                        [--use-one-sent-docs]
5:                        [--evidence-data-path EVIDENCE_DATA_PATH]
2:                        [--universal-checkpoint] [--save SAVE]
2:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
5:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
5:                        [--retriever-score-scaling]
5:                        [--block-data-path BLOCK_DATA_PATH]
5:                        [--embedding-path EMBEDDING_PATH]
5:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
5:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
5:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
5:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
5:                        [--log-params-norm] [--log-num-zeros-in-grad]
5:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
5:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
5:                        [--log-timers-to-tensorboard]
5:                        [--log-batch-size-to-tensorboard]
5:                        [--no-log-learnig-rate-to-tensorboard]
2:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
2:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
2:                        [--loss-scale LOSS_SCALE]
2:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
2:                        [--min-loss-scale MIN_LOSS_SCALE]
2:                        [--loss-scale-window LOSS_SCALE_WINDOW]
2:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
2:                        [--no-query-key-layer-scaling]
2:                        [--attention-softmax-in-fp32]
2:                        [--accumulate-allreduce-grads-in-fp32]
2:                        [--fp16-lm-cross-entropy]
2:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
2:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
2:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
2:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
5:                        [--no-log-loss-scale-to-tensorboard]
5:                        [--log-validation-ppl-to-tensorboard]
5:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
5:                        [--zero-contigious-gradients]
5:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
5:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
5:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
5:                        [--scattered-embeddings] [--split-transformers]
5:                        [--memory-centric-tiled-linear]
5:                        [--tile-factor TILE_FACTOR]
5:                        [--deepspeed-activation-checkpointing]
5:                        [--partition-activations] [--contigious-checkpointing]
5:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
5:                        [--profile-backward] [--deepspeed]
5:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
2:                        [--distributed-backend {nccl,gloo}]
2:                        [--DDP-impl {local,torch}]
2:                        [--use-contiguous-buffers-in-ddp]
2:                        [--no-scatter-gather-tensors-in-pipeline]
2:                        [--local_rank LOCAL_RANK]
2:                        [--lazy-mpu-init LAZY_MPU_INIT]
2:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
2:                        [--eval-interval EVAL_INTERVAL]
2:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
2:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
2:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
2:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
2:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
2:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
5:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
5: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
2:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
2:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
2:                        [--merge-file MERGE_FILE]
2:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
2:                        [--seq-length SEQ_LENGTH]
2:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
2:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
2:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
2:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
2:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
2:                        [--num-workers NUM_WORKERS]
2:                        [--valid-num-workers VALID_NUM_WORKERS]
2:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
2:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
2:                        [--data-impl {lazy,cached,mmap,infer}]
2:                        [--reset-position-ids] [--reset-attention-mask]
2:                        [--eod-mask-loss] [--loss-on-targets-only]
2:                        [--reweight-loss-based-on-position-frequency]
2:                        [--noise-density NOISE_DENSITY]
2:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
2:                        [--adlr-autoresume]
2:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
2:                        [--ict-head-size ICT_HEAD_SIZE]
2:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
2:                        [--biencoder-shared-query-context-model]
2:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
2:                        [--titles-data-path TITLES_DATA_PATH]
2:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
2:                        [--use-one-sent-docs]
2:                        [--evidence-data-path EVIDENCE_DATA_PATH]
2:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
2:                        [--retriever-score-scaling]
2:                        [--block-data-path BLOCK_DATA_PATH]
2:                        [--embedding-path EMBEDDING_PATH]
2:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
2:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
2:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
2:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
2:                        [--log-params-norm] [--log-num-zeros-in-grad]
2:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
2:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
2:                        [--log-timers-to-tensorboard]
2:                        [--log-batch-size-to-tensorboard]
2:                        [--no-log-learnig-rate-to-tensorboard]
2:                        [--no-log-loss-scale-to-tensorboard]
2:                        [--log-validation-ppl-to-tensorboard]
2:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
2:                        [--zero-contigious-gradients]
2:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
2:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
2:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
2:                        [--scattered-embeddings] [--split-transformers]
2:                        [--memory-centric-tiled-linear]
2:                        [--tile-factor TILE_FACTOR]
2:                        [--deepspeed-activation-checkpointing]
2:                        [--partition-activations] [--contigious-checkpointing]
2:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
2:                        [--profile-backward] [--deepspeed]
2:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
2:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
2: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
2:                        [--hidden-size HIDDEN_SIZE]
2:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
2:                        [--num-attention-heads NUM_ATTENTION_HEADS]
2:                        [--kv-channels KV_CHANNELS]
2:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
2:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
2:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
2:                        [--layernorm-epsilon LAYERNORM_EPSILON]
2:                        [--sync-tp-duplicated-parameters]
2:                        [--apply-residual-connection-post-layernorm]
2:                        [--embed-layernorm] [--openai-gelu]
2:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
2:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
2:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
2:                        [--kill-switch-path KILL_SWITCH_PATH]
2:                        [--log-level {debug,info,warning,error,critical}]
2:                        [--log-level-replica {debug,info,warning,error,critical}]
2:                        [--attention-dropout ATTENTION_DROPOUT]
2:                        [--hidden-dropout HIDDEN_DROPOUT]
2:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
2:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
2:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
2:                        [--micro-batch-size MICRO_BATCH_SIZE]
2:                        [--batch-size BATCH_SIZE]
2:                        [--global-batch-size GLOBAL_BATCH_SIZE]
2:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
2:                        [--checkpoint-activations]
2:                        [--distribute-checkpointed-activations]
2:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
2:                        [--train-iters TRAIN_ITERS]
2:                        [--train-samples TRAIN_SAMPLES]
2:                        [--train-tokens TRAIN_TOKENS]
2:                        [--log-interval LOG_INTERVAL]
2:                        [--exit-interval EXIT_INTERVAL]
2:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
2:                        [--tensorboard-dir TENSORBOARD_DIR]
2:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
2:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
2:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
2:                        [--use-bnb-optimizer]
2:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
2:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
2:                        [--eval-only EVAL_ONLY]
2:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
2:                        [--inference]
2:                        [--abort-on-unmet-fused-kernel-constraints]
2:                        [--pp-partition-method PP_PARTITION_METHOD]
2:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
2:                        [--init-method-xavier-uniform] [--lr LR]
2:                        [--lr-decay-style {constant,linear,cosine}]
2:                        [--lr-decay-iters LR_DECAY_ITERS]
2:                        [--lr-decay-samples LR_DECAY_SAMPLES]
2:                        [--lr-decay-tokens LR_DECAY_TOKENS]
2:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
2:                        [--lr-warmup-iters LR_WARMUP_ITERS]
2:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
2:                        [--warmup WARMUP] [--min-lr MIN_LR]
2:                        [--override-lr-scheduler]
2:                        [--use-checkpoint-lr-scheduler]
2:                        [--universal-checkpoint] [--save SAVE]
2:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
2:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
2:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
2:                        [--loss-scale LOSS_SCALE]
2:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
2:                        [--min-loss-scale MIN_LOSS_SCALE]
2:                        [--loss-scale-window LOSS_SCALE_WINDOW]
2:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
2:                        [--no-query-key-layer-scaling]
2:                        [--attention-softmax-in-fp32]
2:                        [--accumulate-allreduce-grads-in-fp32]
2:                        [--fp16-lm-cross-entropy]
2:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
2:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
2:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
2:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
2:                        [--distributed-backend {nccl,gloo}]
2:                        [--DDP-impl {local,torch}]
2:                        [--use-contiguous-buffers-in-ddp]
2:                        [--no-scatter-gather-tensors-in-pipeline]
2:                        [--local_rank LOCAL_RANK]
2:                        [--lazy-mpu-init LAZY_MPU_INIT]
2:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
2:                        [--eval-interval EVAL_INTERVAL]
2:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
2:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
2:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
2:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
2:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
2:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
2:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
2:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
2:                        [--merge-file MERGE_FILE]
2:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
2:                        [--seq-length SEQ_LENGTH]
2:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
2:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
2:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
2:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
2:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
2:                        [--num-workers NUM_WORKERS]
2:                        [--valid-num-workers VALID_NUM_WORKERS]
2:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
2:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
2:                        [--data-impl {lazy,cached,mmap,infer}]
2:                        [--reset-position-ids] [--reset-attention-mask]
2:                        [--eod-mask-loss] [--loss-on-targets-only]
2:                        [--reweight-loss-based-on-position-frequency]
2:                        [--noise-density NOISE_DENSITY]
2:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
2:                        [--adlr-autoresume]
2:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
2:                        [--ict-head-size ICT_HEAD_SIZE]
2:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
2:                        [--biencoder-shared-query-context-model]
2:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
2:                        [--titles-data-path TITLES_DATA_PATH]
2:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
2:                        [--use-one-sent-docs]
2:                        [--evidence-data-path EVIDENCE_DATA_PATH]
2:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
2:                        [--retriever-score-scaling]
2:                        [--block-data-path BLOCK_DATA_PATH]
2:                        [--embedding-path EMBEDDING_PATH]
2:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
2:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
2:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
2:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
2:                        [--log-params-norm] [--log-num-zeros-in-grad]
2:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
2:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
2:                        [--log-timers-to-tensorboard]
2:                        [--log-batch-size-to-tensorboard]
2:                        [--no-log-learnig-rate-to-tensorboard]
2:                        [--no-log-loss-scale-to-tensorboard]
2:                        [--log-validation-ppl-to-tensorboard]
2:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
2:                        [--zero-contigious-gradients]
2:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
2:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
2:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
2:                        [--scattered-embeddings] [--split-transformers]
2:                        [--memory-centric-tiled-linear]
2:                        [--tile-factor TILE_FACTOR]
2:                        [--deepspeed-activation-checkpointing]
2:                        [--partition-activations] [--contigious-checkpointing]
2:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
2:                        [--profile-backward] [--deepspeed]
2:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
2:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
2: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
1: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
1:                        [--hidden-size HIDDEN_SIZE]
1:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
1:                        [--num-attention-heads NUM_ATTENTION_HEADS]
1:                        [--kv-channels KV_CHANNELS]
1:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
1:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
1:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
1:                        [--layernorm-epsilon LAYERNORM_EPSILON]
1:                        [--sync-tp-duplicated-parameters]
1:                        [--apply-residual-connection-post-layernorm]
1:                        [--embed-layernorm] [--openai-gelu]
1:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
1:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
1:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
1:                        [--kill-switch-path KILL_SWITCH_PATH]
1:                        [--log-level {debug,info,warning,error,critical}]
1:                        [--log-level-replica {debug,info,warning,error,critical}]
1:                        [--attention-dropout ATTENTION_DROPOUT]
1:                        [--hidden-dropout HIDDEN_DROPOUT]
1:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
1:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
1:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
1:                        [--micro-batch-size MICRO_BATCH_SIZE]
1:                        [--batch-size BATCH_SIZE]
1:                        [--global-batch-size GLOBAL_BATCH_SIZE]
1:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
1:                        [--checkpoint-activations]
1:                        [--distribute-checkpointed-activations]
1:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
1:                        [--train-iters TRAIN_ITERS]
1:                        [--train-samples TRAIN_SAMPLES]
1:                        [--train-tokens TRAIN_TOKENS]
1:                        [--log-interval LOG_INTERVAL]
1:                        [--exit-interval EXIT_INTERVAL]
1:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
1:                        [--tensorboard-dir TENSORBOARD_DIR]
1:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
1:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
1:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
1:                        [--use-bnb-optimizer]
1:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
1:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
1:                        [--eval-only EVAL_ONLY]
1:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
1:                        [--inference]
1:                        [--abort-on-unmet-fused-kernel-constraints]
1:                        [--pp-partition-method PP_PARTITION_METHOD]
1:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
1:                        [--init-method-xavier-uniform] [--lr LR]
1:                        [--lr-decay-style {constant,linear,cosine}]
1:                        [--lr-decay-iters LR_DECAY_ITERS]
1:                        [--lr-decay-samples LR_DECAY_SAMPLES]
1:                        [--lr-decay-tokens LR_DECAY_TOKENS]
1:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
1:                        [--lr-warmup-iters LR_WARMUP_ITERS]
1:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
1:                        [--warmup WARMUP] [--min-lr MIN_LR]
1:                        [--override-lr-scheduler]
1:                        [--use-checkpoint-lr-scheduler]
1:                        [--universal-checkpoint] [--save SAVE]
1:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
5: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
5:                        [--hidden-size HIDDEN_SIZE]
5:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
5:                        [--num-attention-heads NUM_ATTENTION_HEADS]
5:                        [--kv-channels KV_CHANNELS]
5:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
5:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
5:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
5:                        [--layernorm-epsilon LAYERNORM_EPSILON]
5:                        [--sync-tp-duplicated-parameters]
5:                        [--apply-residual-connection-post-layernorm]
5:                        [--embed-layernorm] [--openai-gelu]
5:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
5:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
1:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
1:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
1:                        [--loss-scale LOSS_SCALE]
1:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
1:                        [--min-loss-scale MIN_LOSS_SCALE]
1:                        [--loss-scale-window LOSS_SCALE_WINDOW]
1:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
1:                        [--no-query-key-layer-scaling]
1:                        [--attention-softmax-in-fp32]
1:                        [--accumulate-allreduce-grads-in-fp32]
1:                        [--fp16-lm-cross-entropy]
1:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
1:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
1:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
1:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
5:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
5:                        [--kill-switch-path KILL_SWITCH_PATH]
5:                        [--log-level {debug,info,warning,error,critical}]
5:                        [--log-level-replica {debug,info,warning,error,critical}]
5:                        [--attention-dropout ATTENTION_DROPOUT]
5:                        [--hidden-dropout HIDDEN_DROPOUT]
5:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
5:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
5:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
5:                        [--micro-batch-size MICRO_BATCH_SIZE]
5:                        [--batch-size BATCH_SIZE]
5:                        [--global-batch-size GLOBAL_BATCH_SIZE]
5:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
5:                        [--checkpoint-activations]
5:                        [--distribute-checkpointed-activations]
1:                        [--distributed-backend {nccl,gloo}]
1:                        [--DDP-impl {local,torch}]
1:                        [--use-contiguous-buffers-in-ddp]
1:                        [--no-scatter-gather-tensors-in-pipeline]
1:                        [--local_rank LOCAL_RANK]
1:                        [--lazy-mpu-init LAZY_MPU_INIT]
1:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
1:                        [--eval-interval EVAL_INTERVAL]
1:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
1:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
1:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
1:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
1:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
1:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
5:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
5:                        [--train-iters TRAIN_ITERS]
5:                        [--train-samples TRAIN_SAMPLES]
5:                        [--train-tokens TRAIN_TOKENS]
5:                        [--log-interval LOG_INTERVAL]
5:                        [--exit-interval EXIT_INTERVAL]
5:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
5:                        [--tensorboard-dir TENSORBOARD_DIR]
5:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
5:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
5:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
5:                        [--use-bnb-optimizer]
5:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
5:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
5:                        [--eval-only EVAL_ONLY]
1:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
1:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
1:                        [--merge-file MERGE_FILE]
1:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
1:                        [--seq-length SEQ_LENGTH]
1:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
1:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
1:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
1:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
1:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
1:                        [--num-workers NUM_WORKERS]
1:                        [--valid-num-workers VALID_NUM_WORKERS]
1:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
1:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
5:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
5:                        [--inference]
5:                        [--abort-on-unmet-fused-kernel-constraints]
5:                        [--pp-partition-method PP_PARTITION_METHOD]
5:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
5:                        [--init-method-xavier-uniform] [--lr LR]
5:                        [--lr-decay-style {constant,linear,cosine}]
5:                        [--lr-decay-iters LR_DECAY_ITERS]
5:                        [--lr-decay-samples LR_DECAY_SAMPLES]
5:                        [--lr-decay-tokens LR_DECAY_TOKENS]
5:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
5:                        [--lr-warmup-iters LR_WARMUP_ITERS]
5:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
5:                        [--warmup WARMUP] [--min-lr MIN_LR]
5:                        [--override-lr-scheduler]
5:                        [--use-checkpoint-lr-scheduler]
1:                        [--data-impl {lazy,cached,mmap,infer}]
1:                        [--reset-position-ids] [--reset-attention-mask]
1:                        [--eod-mask-loss] [--loss-on-targets-only]
1:                        [--reweight-loss-based-on-position-frequency]
1:                        [--noise-density NOISE_DENSITY]
1:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
1:                        [--adlr-autoresume]
1:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
1:                        [--ict-head-size ICT_HEAD_SIZE]
1:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
1:                        [--biencoder-shared-query-context-model]
1:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
1:                        [--titles-data-path TITLES_DATA_PATH]
1:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
1:                        [--use-one-sent-docs]
1:                        [--evidence-data-path EVIDENCE_DATA_PATH]
5:                        [--universal-checkpoint] [--save SAVE]
5:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
1:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
1:                        [--retriever-score-scaling]
1:                        [--block-data-path BLOCK_DATA_PATH]
1:                        [--embedding-path EMBEDDING_PATH]
1:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
1:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
1:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
1:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
1:                        [--log-params-norm] [--log-num-zeros-in-grad]
1:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
1:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
1:                        [--log-timers-to-tensorboard]
1:                        [--log-batch-size-to-tensorboard]
1:                        [--no-log-learnig-rate-to-tensorboard]
5:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
5:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
5:                        [--loss-scale LOSS_SCALE]
5:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
5:                        [--min-loss-scale MIN_LOSS_SCALE]
5:                        [--loss-scale-window LOSS_SCALE_WINDOW]
5:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
5:                        [--no-query-key-layer-scaling]
5:                        [--attention-softmax-in-fp32]
5:                        [--accumulate-allreduce-grads-in-fp32]
5:                        [--fp16-lm-cross-entropy]
5:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
5:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
5:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
5:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
1:                        [--no-log-loss-scale-to-tensorboard]
1:                        [--log-validation-ppl-to-tensorboard]
1:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
1:                        [--zero-contigious-gradients]
1:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
1:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
1:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
1:                        [--scattered-embeddings] [--split-transformers]
1:                        [--memory-centric-tiled-linear]
1:                        [--tile-factor TILE_FACTOR]
1:                        [--deepspeed-activation-checkpointing]
1:                        [--partition-activations] [--contigious-checkpointing]
1:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
1:                        [--profile-backward] [--deepspeed]
1:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
5:                        [--distributed-backend {nccl,gloo}]
5:                        [--DDP-impl {local,torch}]
5:                        [--use-contiguous-buffers-in-ddp]
5:                        [--no-scatter-gather-tensors-in-pipeline]
5:                        [--local_rank LOCAL_RANK]
5:                        [--lazy-mpu-init LAZY_MPU_INIT]
5:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
5:                        [--eval-interval EVAL_INTERVAL]
5:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
5:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
5:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
5:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
5:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
5:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
1:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
1: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
1: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
1:                        [--hidden-size HIDDEN_SIZE]
1:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
1:                        [--num-attention-heads NUM_ATTENTION_HEADS]
1:                        [--kv-channels KV_CHANNELS]
1:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
1:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
1:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
1:                        [--layernorm-epsilon LAYERNORM_EPSILON]
1:                        [--sync-tp-duplicated-parameters]
1:                        [--apply-residual-connection-post-layernorm]
1:                        [--embed-layernorm] [--openai-gelu]
1:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
5:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
5:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
5:                        [--merge-file MERGE_FILE]
5:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
5:                        [--seq-length SEQ_LENGTH]
5:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
5:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
5:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
5:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
5:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
5:                        [--num-workers NUM_WORKERS]
5:                        [--valid-num-workers VALID_NUM_WORKERS]
5:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
5:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
1:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
1:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
1:                        [--kill-switch-path KILL_SWITCH_PATH]
1:                        [--log-level {debug,info,warning,error,critical}]
1:                        [--log-level-replica {debug,info,warning,error,critical}]
1:                        [--attention-dropout ATTENTION_DROPOUT]
1:                        [--hidden-dropout HIDDEN_DROPOUT]
1:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
1:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
1:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
1:                        [--micro-batch-size MICRO_BATCH_SIZE]
1:                        [--batch-size BATCH_SIZE]
1:                        [--global-batch-size GLOBAL_BATCH_SIZE]
1:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
5:                        [--data-impl {lazy,cached,mmap,infer}]
5:                        [--reset-position-ids] [--reset-attention-mask]
5:                        [--eod-mask-loss] [--loss-on-targets-only]
5:                        [--reweight-loss-based-on-position-frequency]
5:                        [--noise-density NOISE_DENSITY]
5:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
5:                        [--adlr-autoresume]
5:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
5:                        [--ict-head-size ICT_HEAD_SIZE]
5:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
5:                        [--biencoder-shared-query-context-model]
5:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
5:                        [--titles-data-path TITLES_DATA_PATH]
5:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
5:                        [--use-one-sent-docs]
5:                        [--evidence-data-path EVIDENCE_DATA_PATH]
1:                        [--checkpoint-activations]
1:                        [--distribute-checkpointed-activations]
5:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
5:                        [--retriever-score-scaling]
5:                        [--block-data-path BLOCK_DATA_PATH]
5:                        [--embedding-path EMBEDDING_PATH]
5:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
5:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
5:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
5:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
5:                        [--log-params-norm] [--log-num-zeros-in-grad]
5:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
5:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
5:                        [--log-timers-to-tensorboard]
5:                        [--log-batch-size-to-tensorboard]
5:                        [--no-log-learnig-rate-to-tensorboard]
1:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
1:                        [--train-iters TRAIN_ITERS]
1:                        [--train-samples TRAIN_SAMPLES]
1:                        [--train-tokens TRAIN_TOKENS]
1:                        [--log-interval LOG_INTERVAL]
1:                        [--exit-interval EXIT_INTERVAL]
1:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
1:                        [--tensorboard-dir TENSORBOARD_DIR]
1:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
1:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
1:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
1:                        [--use-bnb-optimizer]
1:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
1:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
1:                        [--eval-only EVAL_ONLY]
5:                        [--no-log-loss-scale-to-tensorboard]
5:                        [--log-validation-ppl-to-tensorboard]
5:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
5:                        [--zero-contigious-gradients]
5:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
5:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
5:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
5:                        [--scattered-embeddings] [--split-transformers]
5:                        [--memory-centric-tiled-linear]
5:                        [--tile-factor TILE_FACTOR]
5:                        [--deepspeed-activation-checkpointing]
5:                        [--partition-activations] [--contigious-checkpointing]
5:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
5:                        [--profile-backward] [--deepspeed]
5:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
1:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
1:                        [--inference]
1:                        [--abort-on-unmet-fused-kernel-constraints]
1:                        [--pp-partition-method PP_PARTITION_METHOD]
1:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
1:                        [--init-method-xavier-uniform] [--lr LR]
1:                        [--lr-decay-style {constant,linear,cosine}]
1:                        [--lr-decay-iters LR_DECAY_ITERS]
1:                        [--lr-decay-samples LR_DECAY_SAMPLES]
1:                        [--lr-decay-tokens LR_DECAY_TOKENS]
1:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
1:                        [--lr-warmup-iters LR_WARMUP_ITERS]
1:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
1:                        [--warmup WARMUP] [--min-lr MIN_LR]
1:                        [--override-lr-scheduler]
1:                        [--use-checkpoint-lr-scheduler]
5:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
5: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
1:                        [--universal-checkpoint] [--save SAVE]
1:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
1:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
1:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
1:                        [--loss-scale LOSS_SCALE]
1:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
1:                        [--min-loss-scale MIN_LOSS_SCALE]
1:                        [--loss-scale-window LOSS_SCALE_WINDOW]
1:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
1:                        [--no-query-key-layer-scaling]
1:                        [--attention-softmax-in-fp32]
1:                        [--accumulate-allreduce-grads-in-fp32]
1:                        [--fp16-lm-cross-entropy]
1:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
1:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
1:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
1:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
1:                        [--distributed-backend {nccl,gloo}]
1:                        [--DDP-impl {local,torch}]
1:                        [--use-contiguous-buffers-in-ddp]
1:                        [--no-scatter-gather-tensors-in-pipeline]
1:                        [--local_rank LOCAL_RANK]
1:                        [--lazy-mpu-init LAZY_MPU_INIT]
1:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
1:                        [--eval-interval EVAL_INTERVAL]
1:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
1:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
1:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
1:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
1:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
1:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
1:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
1:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
1:                        [--merge-file MERGE_FILE]
1:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
1:                        [--seq-length SEQ_LENGTH]
1:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
1:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
1:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
1:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
1:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
1:                        [--num-workers NUM_WORKERS]
1:                        [--valid-num-workers VALID_NUM_WORKERS]
1:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
1:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
1:                        [--data-impl {lazy,cached,mmap,infer}]
1:                        [--reset-position-ids] [--reset-attention-mask]
1:                        [--eod-mask-loss] [--loss-on-targets-only]
1:                        [--reweight-loss-based-on-position-frequency]
1:                        [--noise-density NOISE_DENSITY]
1:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
1:                        [--adlr-autoresume]
1:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
1:                        [--ict-head-size ICT_HEAD_SIZE]
1:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
1:                        [--biencoder-shared-query-context-model]
1:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
1:                        [--titles-data-path TITLES_DATA_PATH]
1:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
1:                        [--use-one-sent-docs]
1:                        [--evidence-data-path EVIDENCE_DATA_PATH]
1:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
1:                        [--retriever-score-scaling]
1:                        [--block-data-path BLOCK_DATA_PATH]
1:                        [--embedding-path EMBEDDING_PATH]
1:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
1:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
1:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
1:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
1:                        [--log-params-norm] [--log-num-zeros-in-grad]
1:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
1:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
1:                        [--log-timers-to-tensorboard]
1:                        [--log-batch-size-to-tensorboard]
1:                        [--no-log-learnig-rate-to-tensorboard]
1:                        [--no-log-loss-scale-to-tensorboard]
1:                        [--log-validation-ppl-to-tensorboard]
1:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
1:                        [--zero-contigious-gradients]
1:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
1:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
1:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
1:                        [--scattered-embeddings] [--split-transformers]
1:                        [--memory-centric-tiled-linear]
1:                        [--tile-factor TILE_FACTOR]
1:                        [--deepspeed-activation-checkpointing]
1:                        [--partition-activations] [--contigious-checkpointing]
1:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
1:                        [--profile-backward] [--deepspeed]
1:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
1:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
1: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
1: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
1:                        [--hidden-size HIDDEN_SIZE]
1:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
1:                        [--num-attention-heads NUM_ATTENTION_HEADS]
1:                        [--kv-channels KV_CHANNELS]
1:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
1:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
1:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
1:                        [--layernorm-epsilon LAYERNORM_EPSILON]
1:                        [--sync-tp-duplicated-parameters]
1:                        [--apply-residual-connection-post-layernorm]
1:                        [--embed-layernorm] [--openai-gelu]
1:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
1:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
1:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
1:                        [--kill-switch-path KILL_SWITCH_PATH]
1:                        [--log-level {debug,info,warning,error,critical}]
1:                        [--log-level-replica {debug,info,warning,error,critical}]
1:                        [--attention-dropout ATTENTION_DROPOUT]
1:                        [--hidden-dropout HIDDEN_DROPOUT]
1:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
1:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
1:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
1:                        [--micro-batch-size MICRO_BATCH_SIZE]
1:                        [--batch-size BATCH_SIZE]
1:                        [--global-batch-size GLOBAL_BATCH_SIZE]
1:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
1:                        [--checkpoint-activations]
1:                        [--distribute-checkpointed-activations]
1:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
1:                        [--train-iters TRAIN_ITERS]
1:                        [--train-samples TRAIN_SAMPLES]
1:                        [--train-tokens TRAIN_TOKENS]
1:                        [--log-interval LOG_INTERVAL]
1:                        [--exit-interval EXIT_INTERVAL]
1:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
1:                        [--tensorboard-dir TENSORBOARD_DIR]
1:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
1:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
1:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
1:                        [--use-bnb-optimizer]
1:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
1:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
1:                        [--eval-only EVAL_ONLY]
1:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
1:                        [--inference]
1:                        [--abort-on-unmet-fused-kernel-constraints]
1:                        [--pp-partition-method PP_PARTITION_METHOD]
1:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
1:                        [--init-method-xavier-uniform] [--lr LR]
1:                        [--lr-decay-style {constant,linear,cosine}]
1:                        [--lr-decay-iters LR_DECAY_ITERS]
1:                        [--lr-decay-samples LR_DECAY_SAMPLES]
1:                        [--lr-decay-tokens LR_DECAY_TOKENS]
1:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
1:                        [--lr-warmup-iters LR_WARMUP_ITERS]
1:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
1:                        [--warmup WARMUP] [--min-lr MIN_LR]
1:                        [--override-lr-scheduler]
1:                        [--use-checkpoint-lr-scheduler]
1:                        [--universal-checkpoint] [--save SAVE]
1:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
1:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
1:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
1:                        [--loss-scale LOSS_SCALE]
1:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
1:                        [--min-loss-scale MIN_LOSS_SCALE]
1:                        [--loss-scale-window LOSS_SCALE_WINDOW]
1:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
1:                        [--no-query-key-layer-scaling]
1:                        [--attention-softmax-in-fp32]
1:                        [--accumulate-allreduce-grads-in-fp32]
1:                        [--fp16-lm-cross-entropy]
1:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
1:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
1:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
1:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
1:                        [--distributed-backend {nccl,gloo}]
1:                        [--DDP-impl {local,torch}]
1:                        [--use-contiguous-buffers-in-ddp]
1:                        [--no-scatter-gather-tensors-in-pipeline]
1:                        [--local_rank LOCAL_RANK]
1:                        [--lazy-mpu-init LAZY_MPU_INIT]
1:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
1:                        [--eval-interval EVAL_INTERVAL]
1:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
1:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
1:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
1:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
1:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
1:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
1:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
1:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
1:                        [--merge-file MERGE_FILE]
1:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
1:                        [--seq-length SEQ_LENGTH]
1:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
1:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
1:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
1:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
1:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
1:                        [--num-workers NUM_WORKERS]
1:                        [--valid-num-workers VALID_NUM_WORKERS]
1:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
1:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
1:                        [--data-impl {lazy,cached,mmap,infer}]
1:                        [--reset-position-ids] [--reset-attention-mask]
1:                        [--eod-mask-loss] [--loss-on-targets-only]
1:                        [--reweight-loss-based-on-position-frequency]
1:                        [--noise-density NOISE_DENSITY]
1:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
1:                        [--adlr-autoresume]
1:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
1:                        [--ict-head-size ICT_HEAD_SIZE]
1:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
1:                        [--biencoder-shared-query-context-model]
1:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
1:                        [--titles-data-path TITLES_DATA_PATH]
1:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
1:                        [--use-one-sent-docs]
1:                        [--evidence-data-path EVIDENCE_DATA_PATH]
1:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
1:                        [--retriever-score-scaling]
1:                        [--block-data-path BLOCK_DATA_PATH]
1:                        [--embedding-path EMBEDDING_PATH]
1:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
1:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
1:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
1:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
1:                        [--log-params-norm] [--log-num-zeros-in-grad]
1:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
1:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
1:                        [--log-timers-to-tensorboard]
1:                        [--log-batch-size-to-tensorboard]
1:                        [--no-log-learnig-rate-to-tensorboard]
1:                        [--no-log-loss-scale-to-tensorboard]
1:                        [--log-validation-ppl-to-tensorboard]
1:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
1:                        [--zero-contigious-gradients]
1:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
1:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
1:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
1:                        [--scattered-embeddings] [--split-transformers]
1:                        [--memory-centric-tiled-linear]
1:                        [--tile-factor TILE_FACTOR]
1:                        [--deepspeed-activation-checkpointing]
1:                        [--partition-activations] [--contigious-checkpointing]
1:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
1:                        [--profile-backward] [--deepspeed]
1:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
1:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
1: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
1:                        [--hidden-size HIDDEN_SIZE]
1:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
1:                        [--num-attention-heads NUM_ATTENTION_HEADS]
1:                        [--kv-channels KV_CHANNELS]
1:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
1:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
1:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
1:                        [--layernorm-epsilon LAYERNORM_EPSILON]
1:                        [--sync-tp-duplicated-parameters]
1:                        [--apply-residual-connection-post-layernorm]
1:                        [--embed-layernorm] [--openai-gelu]
1:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
1:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
1:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
1:                        [--kill-switch-path KILL_SWITCH_PATH]
1:                        [--log-level {debug,info,warning,error,critical}]
1:                        [--log-level-replica {debug,info,warning,error,critical}]
1:                        [--attention-dropout ATTENTION_DROPOUT]
1:                        [--hidden-dropout HIDDEN_DROPOUT]
1:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
1:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
1:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
1:                        [--micro-batch-size MICRO_BATCH_SIZE]
1:                        [--batch-size BATCH_SIZE]
1:                        [--global-batch-size GLOBAL_BATCH_SIZE]
1:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
1:                        [--checkpoint-activations]
1:                        [--distribute-checkpointed-activations]
1:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
1:                        [--train-iters TRAIN_ITERS]
1:                        [--train-samples TRAIN_SAMPLES]
1:                        [--train-tokens TRAIN_TOKENS]
1:                        [--log-interval LOG_INTERVAL]
1:                        [--exit-interval EXIT_INTERVAL]
1:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
1:                        [--tensorboard-dir TENSORBOARD_DIR]
1:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
1:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
1:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
1:                        [--use-bnb-optimizer]
1:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
1:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
1:                        [--eval-only EVAL_ONLY]
1:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
1:                        [--inference]
1:                        [--abort-on-unmet-fused-kernel-constraints]
1:                        [--pp-partition-method PP_PARTITION_METHOD]
1:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
1:                        [--init-method-xavier-uniform] [--lr LR]
1:                        [--lr-decay-style {constant,linear,cosine}]
1:                        [--lr-decay-iters LR_DECAY_ITERS]
1:                        [--lr-decay-samples LR_DECAY_SAMPLES]
1:                        [--lr-decay-tokens LR_DECAY_TOKENS]
1:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
1:                        [--lr-warmup-iters LR_WARMUP_ITERS]
1:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
1:                        [--warmup WARMUP] [--min-lr MIN_LR]
1:                        [--override-lr-scheduler]
1:                        [--use-checkpoint-lr-scheduler]
1:                        [--universal-checkpoint] [--save SAVE]
1:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
1:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
1:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
1:                        [--loss-scale LOSS_SCALE]
1:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
1:                        [--min-loss-scale MIN_LOSS_SCALE]
1:                        [--loss-scale-window LOSS_SCALE_WINDOW]
1:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
1:                        [--no-query-key-layer-scaling]
1:                        [--attention-softmax-in-fp32]
1:                        [--accumulate-allreduce-grads-in-fp32]
1:                        [--fp16-lm-cross-entropy]
1:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
1:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
1:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
1:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
1:                        [--distributed-backend {nccl,gloo}]
1:                        [--DDP-impl {local,torch}]
1:                        [--use-contiguous-buffers-in-ddp]
1:                        [--no-scatter-gather-tensors-in-pipeline]
1:                        [--local_rank LOCAL_RANK]
1:                        [--lazy-mpu-init LAZY_MPU_INIT]
1:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
1:                        [--eval-interval EVAL_INTERVAL]
1:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
1:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
1:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
1:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
1:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
1:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
1:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
1:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
1:                        [--merge-file MERGE_FILE]
1:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
1:                        [--seq-length SEQ_LENGTH]
1:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
1:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
1:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
1:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
1:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
1:                        [--num-workers NUM_WORKERS]
1:                        [--valid-num-workers VALID_NUM_WORKERS]
1:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
1:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
1:                        [--data-impl {lazy,cached,mmap,infer}]
1:                        [--reset-position-ids] [--reset-attention-mask]
1:                        [--eod-mask-loss] [--loss-on-targets-only]
1:                        [--reweight-loss-based-on-position-frequency]
1:                        [--noise-density NOISE_DENSITY]
1:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
1:                        [--adlr-autoresume]
1:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
1:                        [--ict-head-size ICT_HEAD_SIZE]
1:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
1:                        [--biencoder-shared-query-context-model]
1:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
1:                        [--titles-data-path TITLES_DATA_PATH]
1:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
1:                        [--use-one-sent-docs]
1:                        [--evidence-data-path EVIDENCE_DATA_PATH]
1:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
1:                        [--retriever-score-scaling]
1:                        [--block-data-path BLOCK_DATA_PATH]
1:                        [--embedding-path EMBEDDING_PATH]
1:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
1:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
1:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
1:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
1:                        [--log-params-norm] [--log-num-zeros-in-grad]
1:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
1:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
1:                        [--log-timers-to-tensorboard]
1:                        [--log-batch-size-to-tensorboard]
1:                        [--no-log-learnig-rate-to-tensorboard]
1:                        [--no-log-loss-scale-to-tensorboard]
1:                        [--log-validation-ppl-to-tensorboard]
1:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
1:                        [--zero-contigious-gradients]
1:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
1:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
1:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
1:                        [--scattered-embeddings] [--split-transformers]
1:                        [--memory-centric-tiled-linear]
1:                        [--tile-factor TILE_FACTOR]
1:                        [--deepspeed-activation-checkpointing]
1:                        [--partition-activations] [--contigious-checkpointing]
1:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
1:                        [--profile-backward] [--deepspeed]
1:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
1:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
1: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
1: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
1:                        [--hidden-size HIDDEN_SIZE]
1:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
1:                        [--num-attention-heads NUM_ATTENTION_HEADS]
1:                        [--kv-channels KV_CHANNELS]
1:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
1:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
1:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
1:                        [--layernorm-epsilon LAYERNORM_EPSILON]
1:                        [--sync-tp-duplicated-parameters]
1:                        [--apply-residual-connection-post-layernorm]
1:                        [--embed-layernorm] [--openai-gelu]
1:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
1:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
1:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
1:                        [--kill-switch-path KILL_SWITCH_PATH]
1:                        [--log-level {debug,info,warning,error,critical}]
1:                        [--log-level-replica {debug,info,warning,error,critical}]
1:                        [--attention-dropout ATTENTION_DROPOUT]
1:                        [--hidden-dropout HIDDEN_DROPOUT]
1:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
1:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
1:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
1:                        [--micro-batch-size MICRO_BATCH_SIZE]
1:                        [--batch-size BATCH_SIZE]
1:                        [--global-batch-size GLOBAL_BATCH_SIZE]
1:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
1:                        [--checkpoint-activations]
1:                        [--distribute-checkpointed-activations]
1:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
1:                        [--train-iters TRAIN_ITERS]
1:                        [--train-samples TRAIN_SAMPLES]
1:                        [--train-tokens TRAIN_TOKENS]
1:                        [--log-interval LOG_INTERVAL]
1:                        [--exit-interval EXIT_INTERVAL]
1:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
1:                        [--tensorboard-dir TENSORBOARD_DIR]
1:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
1:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
1:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
1:                        [--use-bnb-optimizer]
1:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
1:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
1:                        [--eval-only EVAL_ONLY]
1:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
1:                        [--inference]
1:                        [--abort-on-unmet-fused-kernel-constraints]
1:                        [--pp-partition-method PP_PARTITION_METHOD]
1:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
1:                        [--init-method-xavier-uniform] [--lr LR]
1:                        [--lr-decay-style {constant,linear,cosine}]
1:                        [--lr-decay-iters LR_DECAY_ITERS]
1:                        [--lr-decay-samples LR_DECAY_SAMPLES]
1:                        [--lr-decay-tokens LR_DECAY_TOKENS]
1:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
1:                        [--lr-warmup-iters LR_WARMUP_ITERS]
1:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
1:                        [--warmup WARMUP] [--min-lr MIN_LR]
1:                        [--override-lr-scheduler]
1:                        [--use-checkpoint-lr-scheduler]
1:                        [--universal-checkpoint] [--save SAVE]
1:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
1:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
1:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
1:                        [--loss-scale LOSS_SCALE]
1:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
1:                        [--min-loss-scale MIN_LOSS_SCALE]
1:                        [--loss-scale-window LOSS_SCALE_WINDOW]
1:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
1:                        [--no-query-key-layer-scaling]
1:                        [--attention-softmax-in-fp32]
1:                        [--accumulate-allreduce-grads-in-fp32]
1:                        [--fp16-lm-cross-entropy]
1:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
1:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
1:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
1:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
1:                        [--distributed-backend {nccl,gloo}]
1:                        [--DDP-impl {local,torch}]
1:                        [--use-contiguous-buffers-in-ddp]
1:                        [--no-scatter-gather-tensors-in-pipeline]
1:                        [--local_rank LOCAL_RANK]
1:                        [--lazy-mpu-init LAZY_MPU_INIT]
1:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
1:                        [--eval-interval EVAL_INTERVAL]
1:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
1:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
1:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
1:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
1:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
1:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
1:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
1:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
1:                        [--merge-file MERGE_FILE]
1:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
1:                        [--seq-length SEQ_LENGTH]
1:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
1:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
1:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
1:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
1:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
1:                        [--num-workers NUM_WORKERS]
1:                        [--valid-num-workers VALID_NUM_WORKERS]
1:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
1:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
1:                        [--data-impl {lazy,cached,mmap,infer}]
1:                        [--reset-position-ids] [--reset-attention-mask]
1:                        [--eod-mask-loss] [--loss-on-targets-only]
1:                        [--reweight-loss-based-on-position-frequency]
1:                        [--noise-density NOISE_DENSITY]
1:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
1:                        [--adlr-autoresume]
1:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
1:                        [--ict-head-size ICT_HEAD_SIZE]
1:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
1:                        [--biencoder-shared-query-context-model]
1:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
1:                        [--titles-data-path TITLES_DATA_PATH]
1:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
1:                        [--use-one-sent-docs]
1:                        [--evidence-data-path EVIDENCE_DATA_PATH]
1:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
1:                        [--retriever-score-scaling]
1:                        [--block-data-path BLOCK_DATA_PATH]
1:                        [--embedding-path EMBEDDING_PATH]
1:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
1:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
1:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
1:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
1:                        [--log-params-norm] [--log-num-zeros-in-grad]
1:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
1:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
1:                        [--log-timers-to-tensorboard]
1:                        [--log-batch-size-to-tensorboard]
1:                        [--no-log-learnig-rate-to-tensorboard]
1:                        [--no-log-loss-scale-to-tensorboard]
1:                        [--log-validation-ppl-to-tensorboard]
1:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
1:                        [--zero-contigious-gradients]
1:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
1:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
1:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
1:                        [--scattered-embeddings] [--split-transformers]
1:                        [--memory-centric-tiled-linear]
1:                        [--tile-factor TILE_FACTOR]
1:                        [--deepspeed-activation-checkpointing]
1:                        [--partition-activations] [--contigious-checkpointing]
1:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
1:                        [--profile-backward] [--deepspeed]
1:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
1:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
1: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
2: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
2:                        [--hidden-size HIDDEN_SIZE]
2:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
2:                        [--num-attention-heads NUM_ATTENTION_HEADS]
2:                        [--kv-channels KV_CHANNELS]
2:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
2:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
2:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
2:                        [--layernorm-epsilon LAYERNORM_EPSILON]
2:                        [--sync-tp-duplicated-parameters]
2:                        [--apply-residual-connection-post-layernorm]
2:                        [--embed-layernorm] [--openai-gelu]
2:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
2:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
2:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
2:                        [--kill-switch-path KILL_SWITCH_PATH]
2:                        [--log-level {debug,info,warning,error,critical}]
2:                        [--log-level-replica {debug,info,warning,error,critical}]
2:                        [--attention-dropout ATTENTION_DROPOUT]
2:                        [--hidden-dropout HIDDEN_DROPOUT]
2:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
2:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
2:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
2:                        [--micro-batch-size MICRO_BATCH_SIZE]
2:                        [--batch-size BATCH_SIZE]
2:                        [--global-batch-size GLOBAL_BATCH_SIZE]
2:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
2:                        [--checkpoint-activations]
2:                        [--distribute-checkpointed-activations]
2:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
2:                        [--train-iters TRAIN_ITERS]
2:                        [--train-samples TRAIN_SAMPLES]
2:                        [--train-tokens TRAIN_TOKENS]
2:                        [--log-interval LOG_INTERVAL]
2:                        [--exit-interval EXIT_INTERVAL]
2:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
2:                        [--tensorboard-dir TENSORBOARD_DIR]
2:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
2:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
2:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
2:                        [--use-bnb-optimizer]
2:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
2:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
2:                        [--eval-only EVAL_ONLY]
2:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
2:                        [--inference]
2:                        [--abort-on-unmet-fused-kernel-constraints]
2:                        [--pp-partition-method PP_PARTITION_METHOD]
2:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
2:                        [--init-method-xavier-uniform] [--lr LR]
2:                        [--lr-decay-style {constant,linear,cosine}]
2:                        [--lr-decay-iters LR_DECAY_ITERS]
2:                        [--lr-decay-samples LR_DECAY_SAMPLES]
2:                        [--lr-decay-tokens LR_DECAY_TOKENS]
2:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
2:                        [--lr-warmup-iters LR_WARMUP_ITERS]
2:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
2:                        [--warmup WARMUP] [--min-lr MIN_LR]
2:                        [--override-lr-scheduler]
2:                        [--use-checkpoint-lr-scheduler]
2:                        [--universal-checkpoint] [--save SAVE]
2:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
2:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
2:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
2:                        [--loss-scale LOSS_SCALE]
2:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
2:                        [--min-loss-scale MIN_LOSS_SCALE]
2:                        [--loss-scale-window LOSS_SCALE_WINDOW]
2:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
2:                        [--no-query-key-layer-scaling]
2:                        [--attention-softmax-in-fp32]
2:                        [--accumulate-allreduce-grads-in-fp32]
2:                        [--fp16-lm-cross-entropy]
2:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
2:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
2:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
2:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
2:                        [--distributed-backend {nccl,gloo}]
2:                        [--DDP-impl {local,torch}]
2:                        [--use-contiguous-buffers-in-ddp]
2:                        [--no-scatter-gather-tensors-in-pipeline]
2:                        [--local_rank LOCAL_RANK]
2:                        [--lazy-mpu-init LAZY_MPU_INIT]
2:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
2:                        [--eval-interval EVAL_INTERVAL]
2:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
2:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
2:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
2:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
2:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
2:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
2:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
2:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
2:                        [--merge-file MERGE_FILE]
2:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
2:                        [--seq-length SEQ_LENGTH]
2:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
2:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
2:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
2:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
2:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
2:                        [--num-workers NUM_WORKERS]
2:                        [--valid-num-workers VALID_NUM_WORKERS]
2:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
2:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
2:                        [--data-impl {lazy,cached,mmap,infer}]
2:                        [--reset-position-ids] [--reset-attention-mask]
2:                        [--eod-mask-loss] [--loss-on-targets-only]
2:                        [--reweight-loss-based-on-position-frequency]
2:                        [--noise-density NOISE_DENSITY]
2:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
2:                        [--adlr-autoresume]
2:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
2:                        [--ict-head-size ICT_HEAD_SIZE]
2:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
2:                        [--biencoder-shared-query-context-model]
2:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
2:                        [--titles-data-path TITLES_DATA_PATH]
2:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
2:                        [--use-one-sent-docs]
2:                        [--evidence-data-path EVIDENCE_DATA_PATH]
2:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
2:                        [--retriever-score-scaling]
2:                        [--block-data-path BLOCK_DATA_PATH]
2:                        [--embedding-path EMBEDDING_PATH]
2:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
2:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
2:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
2:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
2:                        [--log-params-norm] [--log-num-zeros-in-grad]
2:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
2:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
2:                        [--log-timers-to-tensorboard]
2:                        [--log-batch-size-to-tensorboard]
2:                        [--no-log-learnig-rate-to-tensorboard]
2:                        [--no-log-loss-scale-to-tensorboard]
2:                        [--log-validation-ppl-to-tensorboard]
2:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
2:                        [--zero-contigious-gradients]
2:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
2:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
2:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
2:                        [--scattered-embeddings] [--split-transformers]
2:                        [--memory-centric-tiled-linear]
2:                        [--tile-factor TILE_FACTOR]
2:                        [--deepspeed-activation-checkpointing]
2:                        [--partition-activations] [--contigious-checkpointing]
2:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
2:                        [--profile-backward] [--deepspeed]
2:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
2:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
2: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
2:                        [--hidden-size HIDDEN_SIZE]
2:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
2:                        [--num-attention-heads NUM_ATTENTION_HEADS]
2:                        [--kv-channels KV_CHANNELS]
2:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
2:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
2:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
2:                        [--layernorm-epsilon LAYERNORM_EPSILON]
2:                        [--sync-tp-duplicated-parameters]
2:                        [--apply-residual-connection-post-layernorm]
2:                        [--embed-layernorm] [--openai-gelu]
2:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
2:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
2:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
2:                        [--kill-switch-path KILL_SWITCH_PATH]
2:                        [--log-level {debug,info,warning,error,critical}]
2:                        [--log-level-replica {debug,info,warning,error,critical}]
2:                        [--attention-dropout ATTENTION_DROPOUT]
2:                        [--hidden-dropout HIDDEN_DROPOUT]
2:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
2:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
2:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
2:                        [--micro-batch-size MICRO_BATCH_SIZE]
2:                        [--batch-size BATCH_SIZE]
2:                        [--global-batch-size GLOBAL_BATCH_SIZE]
2:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
2:                        [--checkpoint-activations]
2:                        [--distribute-checkpointed-activations]
2:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
2:                        [--train-iters TRAIN_ITERS]
2:                        [--train-samples TRAIN_SAMPLES]
2:                        [--train-tokens TRAIN_TOKENS]
2:                        [--log-interval LOG_INTERVAL]
2:                        [--exit-interval EXIT_INTERVAL]
2:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
2:                        [--tensorboard-dir TENSORBOARD_DIR]
2:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
2:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
2:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
2:                        [--use-bnb-optimizer]
2:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
2:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
2:                        [--eval-only EVAL_ONLY]
2:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
2:                        [--inference]
2:                        [--abort-on-unmet-fused-kernel-constraints]
2:                        [--pp-partition-method PP_PARTITION_METHOD]
2:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
2:                        [--init-method-xavier-uniform] [--lr LR]
2:                        [--lr-decay-style {constant,linear,cosine}]
2:                        [--lr-decay-iters LR_DECAY_ITERS]
2:                        [--lr-decay-samples LR_DECAY_SAMPLES]
2:                        [--lr-decay-tokens LR_DECAY_TOKENS]
2:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
2:                        [--lr-warmup-iters LR_WARMUP_ITERS]
2:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
2:                        [--warmup WARMUP] [--min-lr MIN_LR]
2:                        [--override-lr-scheduler]
2:                        [--use-checkpoint-lr-scheduler]
2:                        [--universal-checkpoint] [--save SAVE]
2:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
2:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
2:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
2:                        [--loss-scale LOSS_SCALE]
2:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
2:                        [--min-loss-scale MIN_LOSS_SCALE]
2:                        [--loss-scale-window LOSS_SCALE_WINDOW]
2:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
2:                        [--no-query-key-layer-scaling]
2:                        [--attention-softmax-in-fp32]
2:                        [--accumulate-allreduce-grads-in-fp32]
2:                        [--fp16-lm-cross-entropy]
2:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
2:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
2:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
2:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
2:                        [--distributed-backend {nccl,gloo}]
2:                        [--DDP-impl {local,torch}]
2:                        [--use-contiguous-buffers-in-ddp]
2:                        [--no-scatter-gather-tensors-in-pipeline]
2:                        [--local_rank LOCAL_RANK]
2:                        [--lazy-mpu-init LAZY_MPU_INIT]
2:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
2:                        [--eval-interval EVAL_INTERVAL]
2:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
2:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
2:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
2:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
2:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
2:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
2:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
2:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
2:                        [--merge-file MERGE_FILE]
2:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
2:                        [--seq-length SEQ_LENGTH]
2:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
2:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
2:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
2:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
2:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
2:                        [--num-workers NUM_WORKERS]
2:                        [--valid-num-workers VALID_NUM_WORKERS]
2:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
2:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
2:                        [--data-impl {lazy,cached,mmap,infer}]
2:                        [--reset-position-ids] [--reset-attention-mask]
2:                        [--eod-mask-loss] [--loss-on-targets-only]
2:                        [--reweight-loss-based-on-position-frequency]
2:                        [--noise-density NOISE_DENSITY]
2:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
2:                        [--adlr-autoresume]
2:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
2:                        [--ict-head-size ICT_HEAD_SIZE]
2:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
2:                        [--biencoder-shared-query-context-model]
2:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
2:                        [--titles-data-path TITLES_DATA_PATH]
2:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
2:                        [--use-one-sent-docs]
2:                        [--evidence-data-path EVIDENCE_DATA_PATH]
2:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
2:                        [--retriever-score-scaling]
2:                        [--block-data-path BLOCK_DATA_PATH]
2:                        [--embedding-path EMBEDDING_PATH]
2:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
2:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
2:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
2:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
2:                        [--log-params-norm] [--log-num-zeros-in-grad]
2:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
2:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
2:                        [--log-timers-to-tensorboard]
2:                        [--log-batch-size-to-tensorboard]
2:                        [--no-log-learnig-rate-to-tensorboard]
2:                        [--no-log-loss-scale-to-tensorboard]
2:                        [--log-validation-ppl-to-tensorboard]
2:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
2:                        [--zero-contigious-gradients]
2:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
2:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
2:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
2:                        [--scattered-embeddings] [--split-transformers]
2:                        [--memory-centric-tiled-linear]
2:                        [--tile-factor TILE_FACTOR]
2:                        [--deepspeed-activation-checkpointing]
2:                        [--partition-activations] [--contigious-checkpointing]
2:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
2:                        [--profile-backward] [--deepspeed]
2:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
2:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
2: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
2: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
2:                        [--hidden-size HIDDEN_SIZE]
2:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
2:                        [--num-attention-heads NUM_ATTENTION_HEADS]
2:                        [--kv-channels KV_CHANNELS]
2:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
2:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
2:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
2:                        [--layernorm-epsilon LAYERNORM_EPSILON]
2:                        [--sync-tp-duplicated-parameters]
2:                        [--apply-residual-connection-post-layernorm]
2:                        [--embed-layernorm] [--openai-gelu]
2:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
2:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
2:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
2:                        [--kill-switch-path KILL_SWITCH_PATH]
2:                        [--log-level {debug,info,warning,error,critical}]
2:                        [--log-level-replica {debug,info,warning,error,critical}]
2:                        [--attention-dropout ATTENTION_DROPOUT]
2:                        [--hidden-dropout HIDDEN_DROPOUT]
2:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
2:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
2:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
2:                        [--micro-batch-size MICRO_BATCH_SIZE]
2:                        [--batch-size BATCH_SIZE]
2:                        [--global-batch-size GLOBAL_BATCH_SIZE]
2:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
2:                        [--checkpoint-activations]
2:                        [--distribute-checkpointed-activations]
2:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
2:                        [--train-iters TRAIN_ITERS]
2:                        [--train-samples TRAIN_SAMPLES]
2:                        [--train-tokens TRAIN_TOKENS]
2:                        [--log-interval LOG_INTERVAL]
2:                        [--exit-interval EXIT_INTERVAL]
2:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
2:                        [--tensorboard-dir TENSORBOARD_DIR]
2:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
2:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
2:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
2:                        [--use-bnb-optimizer]
2:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
2:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
2:                        [--eval-only EVAL_ONLY]
2:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
2:                        [--inference]
2:                        [--abort-on-unmet-fused-kernel-constraints]
2:                        [--pp-partition-method PP_PARTITION_METHOD]
2:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
2:                        [--init-method-xavier-uniform] [--lr LR]
2:                        [--lr-decay-style {constant,linear,cosine}]
2:                        [--lr-decay-iters LR_DECAY_ITERS]
2:                        [--lr-decay-samples LR_DECAY_SAMPLES]
2:                        [--lr-decay-tokens LR_DECAY_TOKENS]
2:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
2:                        [--lr-warmup-iters LR_WARMUP_ITERS]
2:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
2:                        [--warmup WARMUP] [--min-lr MIN_LR]
2:                        [--override-lr-scheduler]
2:                        [--use-checkpoint-lr-scheduler]
2:                        [--universal-checkpoint] [--save SAVE]
2:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
2:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
2:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
2:                        [--loss-scale LOSS_SCALE]
2:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
2:                        [--min-loss-scale MIN_LOSS_SCALE]
2:                        [--loss-scale-window LOSS_SCALE_WINDOW]
2:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
2:                        [--no-query-key-layer-scaling]
2:                        [--attention-softmax-in-fp32]
2:                        [--accumulate-allreduce-grads-in-fp32]
2:                        [--fp16-lm-cross-entropy]
2:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
2:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
2:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
2:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
2:                        [--distributed-backend {nccl,gloo}]
2:                        [--DDP-impl {local,torch}]
2:                        [--use-contiguous-buffers-in-ddp]
2:                        [--no-scatter-gather-tensors-in-pipeline]
2:                        [--local_rank LOCAL_RANK]
2:                        [--lazy-mpu-init LAZY_MPU_INIT]
2:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
2:                        [--eval-interval EVAL_INTERVAL]
2:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
2:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
2:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
2:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
2:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
2:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
2:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
2:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
2:                        [--merge-file MERGE_FILE]
2:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
2:                        [--seq-length SEQ_LENGTH]
2:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
2:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
2:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
2:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
2:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
2:                        [--num-workers NUM_WORKERS]
2:                        [--valid-num-workers VALID_NUM_WORKERS]
2:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
2:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
2:                        [--data-impl {lazy,cached,mmap,infer}]
2:                        [--reset-position-ids] [--reset-attention-mask]
2:                        [--eod-mask-loss] [--loss-on-targets-only]
2:                        [--reweight-loss-based-on-position-frequency]
2:                        [--noise-density NOISE_DENSITY]
2:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
2:                        [--adlr-autoresume]
2:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
2:                        [--ict-head-size ICT_HEAD_SIZE]
2:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
2:                        [--biencoder-shared-query-context-model]
2:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
2:                        [--titles-data-path TITLES_DATA_PATH]
2:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
2:                        [--use-one-sent-docs]
2:                        [--evidence-data-path EVIDENCE_DATA_PATH]
2:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
2:                        [--retriever-score-scaling]
2:                        [--block-data-path BLOCK_DATA_PATH]
2:                        [--embedding-path EMBEDDING_PATH]
2:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
2:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
2:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
2:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
2:                        [--log-params-norm] [--log-num-zeros-in-grad]
2:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
2:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
2:                        [--log-timers-to-tensorboard]
2:                        [--log-batch-size-to-tensorboard]
2:                        [--no-log-learnig-rate-to-tensorboard]
2:                        [--no-log-loss-scale-to-tensorboard]
2:                        [--log-validation-ppl-to-tensorboard]
2:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
2:                        [--zero-contigious-gradients]
2:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
2:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
2:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
2:                        [--scattered-embeddings] [--split-transformers]
2:                        [--memory-centric-tiled-linear]
2:                        [--tile-factor TILE_FACTOR]
2:                        [--deepspeed-activation-checkpointing]
2:                        [--partition-activations] [--contigious-checkpointing]
2:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
2:                        [--profile-backward] [--deepspeed]
2:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
2:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
2: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
5: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
5:                        [--hidden-size HIDDEN_SIZE]
5:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
5:                        [--num-attention-heads NUM_ATTENTION_HEADS]
5:                        [--kv-channels KV_CHANNELS]
5:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
5:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
5:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
5:                        [--layernorm-epsilon LAYERNORM_EPSILON]
5:                        [--sync-tp-duplicated-parameters]
5:                        [--apply-residual-connection-post-layernorm]
5:                        [--embed-layernorm] [--openai-gelu]
5:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
5:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
5:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
5:                        [--kill-switch-path KILL_SWITCH_PATH]
5:                        [--log-level {debug,info,warning,error,critical}]
5:                        [--log-level-replica {debug,info,warning,error,critical}]
5:                        [--attention-dropout ATTENTION_DROPOUT]
5:                        [--hidden-dropout HIDDEN_DROPOUT]
5:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
5:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
5:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
5:                        [--micro-batch-size MICRO_BATCH_SIZE]
5:                        [--batch-size BATCH_SIZE]
5:                        [--global-batch-size GLOBAL_BATCH_SIZE]
5:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
5:                        [--checkpoint-activations]
5:                        [--distribute-checkpointed-activations]
5:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
5:                        [--train-iters TRAIN_ITERS]
5:                        [--train-samples TRAIN_SAMPLES]
5:                        [--train-tokens TRAIN_TOKENS]
5:                        [--log-interval LOG_INTERVAL]
5:                        [--exit-interval EXIT_INTERVAL]
5:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
5:                        [--tensorboard-dir TENSORBOARD_DIR]
5:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
5:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
5:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
5:                        [--use-bnb-optimizer]
5:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
5:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
5:                        [--eval-only EVAL_ONLY]
5:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
5:                        [--inference]
5:                        [--abort-on-unmet-fused-kernel-constraints]
5:                        [--pp-partition-method PP_PARTITION_METHOD]
5:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
5:                        [--init-method-xavier-uniform] [--lr LR]
5:                        [--lr-decay-style {constant,linear,cosine}]
5:                        [--lr-decay-iters LR_DECAY_ITERS]
5:                        [--lr-decay-samples LR_DECAY_SAMPLES]
5:                        [--lr-decay-tokens LR_DECAY_TOKENS]
5:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
5:                        [--lr-warmup-iters LR_WARMUP_ITERS]
5:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
5:                        [--warmup WARMUP] [--min-lr MIN_LR]
5:                        [--override-lr-scheduler]
5:                        [--use-checkpoint-lr-scheduler]
5:                        [--universal-checkpoint] [--save SAVE]
5:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
5:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
5:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
5:                        [--loss-scale LOSS_SCALE]
5:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
5:                        [--min-loss-scale MIN_LOSS_SCALE]
5:                        [--loss-scale-window LOSS_SCALE_WINDOW]
5:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
5:                        [--no-query-key-layer-scaling]
5:                        [--attention-softmax-in-fp32]
5:                        [--accumulate-allreduce-grads-in-fp32]
5:                        [--fp16-lm-cross-entropy]
5:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
5:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
5:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
5:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
5:                        [--distributed-backend {nccl,gloo}]
5:                        [--DDP-impl {local,torch}]
5:                        [--use-contiguous-buffers-in-ddp]
5:                        [--no-scatter-gather-tensors-in-pipeline]
5:                        [--local_rank LOCAL_RANK]
5:                        [--lazy-mpu-init LAZY_MPU_INIT]
5:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
5:                        [--eval-interval EVAL_INTERVAL]
5:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
5:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
5:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
5:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
5:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
5:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
5:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
5:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
5:                        [--merge-file MERGE_FILE]
5:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
5:                        [--seq-length SEQ_LENGTH]
5:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
5:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
5:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
5:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
5:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
5:                        [--num-workers NUM_WORKERS]
5:                        [--valid-num-workers VALID_NUM_WORKERS]
5:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
5:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
5:                        [--data-impl {lazy,cached,mmap,infer}]
5:                        [--reset-position-ids] [--reset-attention-mask]
5:                        [--eod-mask-loss] [--loss-on-targets-only]
5:                        [--reweight-loss-based-on-position-frequency]
5:                        [--noise-density NOISE_DENSITY]
5:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
5:                        [--adlr-autoresume]
5:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
5:                        [--ict-head-size ICT_HEAD_SIZE]
5:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
5:                        [--biencoder-shared-query-context-model]
5:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
5:                        [--titles-data-path TITLES_DATA_PATH]
5:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
5:                        [--use-one-sent-docs]
5:                        [--evidence-data-path EVIDENCE_DATA_PATH]
5:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
5:                        [--retriever-score-scaling]
5:                        [--block-data-path BLOCK_DATA_PATH]
5:                        [--embedding-path EMBEDDING_PATH]
5:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
5:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
5:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
5:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
5:                        [--log-params-norm] [--log-num-zeros-in-grad]
5:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
5:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
5:                        [--log-timers-to-tensorboard]
5:                        [--log-batch-size-to-tensorboard]
5:                        [--no-log-learnig-rate-to-tensorboard]
5:                        [--no-log-loss-scale-to-tensorboard]
5:                        [--log-validation-ppl-to-tensorboard]
5:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
5:                        [--zero-contigious-gradients]
5:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
5:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
5:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
5:                        [--scattered-embeddings] [--split-transformers]
5:                        [--memory-centric-tiled-linear]
5:                        [--tile-factor TILE_FACTOR]
5:                        [--deepspeed-activation-checkpointing]
5:                        [--partition-activations] [--contigious-checkpointing]
5:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
5:                        [--profile-backward] [--deepspeed]
5:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
5:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
5: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
2: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
2:                        [--hidden-size HIDDEN_SIZE]
2:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
2:                        [--num-attention-heads NUM_ATTENTION_HEADS]
2:                        [--kv-channels KV_CHANNELS]
2:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
2:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
2:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
2:                        [--layernorm-epsilon LAYERNORM_EPSILON]
2:                        [--sync-tp-duplicated-parameters]
2:                        [--apply-residual-connection-post-layernorm]
2:                        [--embed-layernorm] [--openai-gelu]
2:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
2:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
2:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
2:                        [--kill-switch-path KILL_SWITCH_PATH]
2:                        [--log-level {debug,info,warning,error,critical}]
2:                        [--log-level-replica {debug,info,warning,error,critical}]
2:                        [--attention-dropout ATTENTION_DROPOUT]
2:                        [--hidden-dropout HIDDEN_DROPOUT]
2:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
2:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
2:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
2:                        [--micro-batch-size MICRO_BATCH_SIZE]
2:                        [--batch-size BATCH_SIZE]
2:                        [--global-batch-size GLOBAL_BATCH_SIZE]
2:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
2:                        [--checkpoint-activations]
2:                        [--distribute-checkpointed-activations]
2:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
2:                        [--train-iters TRAIN_ITERS]
2:                        [--train-samples TRAIN_SAMPLES]
2:                        [--train-tokens TRAIN_TOKENS]
2:                        [--log-interval LOG_INTERVAL]
2:                        [--exit-interval EXIT_INTERVAL]
2:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
2:                        [--tensorboard-dir TENSORBOARD_DIR]
2:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
2:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
2:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
2:                        [--use-bnb-optimizer]
2:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
2:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
2:                        [--eval-only EVAL_ONLY]
2:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
2:                        [--inference]
2:                        [--abort-on-unmet-fused-kernel-constraints]
2:                        [--pp-partition-method PP_PARTITION_METHOD]
2:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
2:                        [--init-method-xavier-uniform] [--lr LR]
2:                        [--lr-decay-style {constant,linear,cosine}]
2:                        [--lr-decay-iters LR_DECAY_ITERS]
2:                        [--lr-decay-samples LR_DECAY_SAMPLES]
2:                        [--lr-decay-tokens LR_DECAY_TOKENS]
2:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
2:                        [--lr-warmup-iters LR_WARMUP_ITERS]
2:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
2:                        [--warmup WARMUP] [--min-lr MIN_LR]
2:                        [--override-lr-scheduler]
2:                        [--use-checkpoint-lr-scheduler]
2:                        [--universal-checkpoint] [--save SAVE]
2:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
2:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
2:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
2:                        [--loss-scale LOSS_SCALE]
2:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
2:                        [--min-loss-scale MIN_LOSS_SCALE]
2:                        [--loss-scale-window LOSS_SCALE_WINDOW]
2:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
2:                        [--no-query-key-layer-scaling]
2:                        [--attention-softmax-in-fp32]
2:                        [--accumulate-allreduce-grads-in-fp32]
2:                        [--fp16-lm-cross-entropy]
2:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
2:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
2:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
2:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
2:                        [--distributed-backend {nccl,gloo}]
2:                        [--DDP-impl {local,torch}]
2:                        [--use-contiguous-buffers-in-ddp]
2:                        [--no-scatter-gather-tensors-in-pipeline]
2:                        [--local_rank LOCAL_RANK]
2:                        [--lazy-mpu-init LAZY_MPU_INIT]
2:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
2:                        [--eval-interval EVAL_INTERVAL]
2:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
2:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
2:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
2:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
2:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
2:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
2:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
2:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
2:                        [--merge-file MERGE_FILE]
2:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
2:                        [--seq-length SEQ_LENGTH]
2:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
2:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
2:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
2:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
2:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
2:                        [--num-workers NUM_WORKERS]
2:                        [--valid-num-workers VALID_NUM_WORKERS]
2:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
2:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
2:                        [--data-impl {lazy,cached,mmap,infer}]
2:                        [--reset-position-ids] [--reset-attention-mask]
2:                        [--eod-mask-loss] [--loss-on-targets-only]
2:                        [--reweight-loss-based-on-position-frequency]
2:                        [--noise-density NOISE_DENSITY]
2:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
2:                        [--adlr-autoresume]
2:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
2:                        [--ict-head-size ICT_HEAD_SIZE]
2:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
2:                        [--biencoder-shared-query-context-model]
2:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
2:                        [--titles-data-path TITLES_DATA_PATH]
2:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
2:                        [--use-one-sent-docs]
2:                        [--evidence-data-path EVIDENCE_DATA_PATH]
2:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
2:                        [--retriever-score-scaling]
2:                        [--block-data-path BLOCK_DATA_PATH]
2:                        [--embedding-path EMBEDDING_PATH]
2:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
2:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
2:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
2:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
2:                        [--log-params-norm] [--log-num-zeros-in-grad]
2:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
2:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
2:                        [--log-timers-to-tensorboard]
2:                        [--log-batch-size-to-tensorboard]
2:                        [--no-log-learnig-rate-to-tensorboard]
2:                        [--no-log-loss-scale-to-tensorboard]
2:                        [--log-validation-ppl-to-tensorboard]
2:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
2:                        [--zero-contigious-gradients]
2:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
2:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
2:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
2:                        [--scattered-embeddings] [--split-transformers]
2:                        [--memory-centric-tiled-linear]
2:                        [--tile-factor TILE_FACTOR]
2:                        [--deepspeed-activation-checkpointing]
2:                        [--partition-activations] [--contigious-checkpointing]
2:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
2:                        [--profile-backward] [--deepspeed]
2:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
2:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
2: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
2: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
2:                        [--hidden-size HIDDEN_SIZE]
2:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
2:                        [--num-attention-heads NUM_ATTENTION_HEADS]
2:                        [--kv-channels KV_CHANNELS]
2:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
2:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
2:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
2:                        [--layernorm-epsilon LAYERNORM_EPSILON]
2:                        [--sync-tp-duplicated-parameters]
2:                        [--apply-residual-connection-post-layernorm]
2:                        [--embed-layernorm] [--openai-gelu]
2:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
2:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
2:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
2:                        [--kill-switch-path KILL_SWITCH_PATH]
2:                        [--log-level {debug,info,warning,error,critical}]
2:                        [--log-level-replica {debug,info,warning,error,critical}]
2:                        [--attention-dropout ATTENTION_DROPOUT]
2:                        [--hidden-dropout HIDDEN_DROPOUT]
2:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
2:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
2:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
2:                        [--micro-batch-size MICRO_BATCH_SIZE]
2:                        [--batch-size BATCH_SIZE]
2:                        [--global-batch-size GLOBAL_BATCH_SIZE]
2:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
2:                        [--checkpoint-activations]
2:                        [--distribute-checkpointed-activations]
2:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
2:                        [--train-iters TRAIN_ITERS]
2:                        [--train-samples TRAIN_SAMPLES]
2:                        [--train-tokens TRAIN_TOKENS]
2:                        [--log-interval LOG_INTERVAL]
2:                        [--exit-interval EXIT_INTERVAL]
2:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
2:                        [--tensorboard-dir TENSORBOARD_DIR]
2:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
2:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
2:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
2:                        [--use-bnb-optimizer]
2:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
2:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
2:                        [--eval-only EVAL_ONLY]
2:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
2:                        [--inference]
2:                        [--abort-on-unmet-fused-kernel-constraints]
2:                        [--pp-partition-method PP_PARTITION_METHOD]
2:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
2:                        [--init-method-xavier-uniform] [--lr LR]
2:                        [--lr-decay-style {constant,linear,cosine}]
2:                        [--lr-decay-iters LR_DECAY_ITERS]
2:                        [--lr-decay-samples LR_DECAY_SAMPLES]
2:                        [--lr-decay-tokens LR_DECAY_TOKENS]
2:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
2:                        [--lr-warmup-iters LR_WARMUP_ITERS]
2:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
2:                        [--warmup WARMUP] [--min-lr MIN_LR]
2:                        [--override-lr-scheduler]
2:                        [--use-checkpoint-lr-scheduler]
2:                        [--universal-checkpoint] [--save SAVE]
2:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
2:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
2:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
2:                        [--loss-scale LOSS_SCALE]
2:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
2:                        [--min-loss-scale MIN_LOSS_SCALE]
2:                        [--loss-scale-window LOSS_SCALE_WINDOW]
2:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
2:                        [--no-query-key-layer-scaling]
2:                        [--attention-softmax-in-fp32]
2:                        [--accumulate-allreduce-grads-in-fp32]
2:                        [--fp16-lm-cross-entropy]
2:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
2:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
2:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
2:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
2:                        [--distributed-backend {nccl,gloo}]
2:                        [--DDP-impl {local,torch}]
2:                        [--use-contiguous-buffers-in-ddp]
2:                        [--no-scatter-gather-tensors-in-pipeline]
2:                        [--local_rank LOCAL_RANK]
2:                        [--lazy-mpu-init LAZY_MPU_INIT]
2:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
2:                        [--eval-interval EVAL_INTERVAL]
2:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
2:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
2:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
2:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
2:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
2:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
2:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
2:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
2:                        [--merge-file MERGE_FILE]
2:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
2:                        [--seq-length SEQ_LENGTH]
2:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
2:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
2:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
2:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
2:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
2:                        [--num-workers NUM_WORKERS]
2:                        [--valid-num-workers VALID_NUM_WORKERS]
2:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
2:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
2:                        [--data-impl {lazy,cached,mmap,infer}]
2:                        [--reset-position-ids] [--reset-attention-mask]
2:                        [--eod-mask-loss] [--loss-on-targets-only]
2:                        [--reweight-loss-based-on-position-frequency]
2:                        [--noise-density NOISE_DENSITY]
2:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
2:                        [--adlr-autoresume]
2:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
2:                        [--ict-head-size ICT_HEAD_SIZE]
2:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
2:                        [--biencoder-shared-query-context-model]
2:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
2:                        [--titles-data-path TITLES_DATA_PATH]
2:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
2:                        [--use-one-sent-docs]
2:                        [--evidence-data-path EVIDENCE_DATA_PATH]
2:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
2:                        [--retriever-score-scaling]
2:                        [--block-data-path BLOCK_DATA_PATH]
2:                        [--embedding-path EMBEDDING_PATH]
2:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
2:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
2:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
2:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
2:                        [--log-params-norm] [--log-num-zeros-in-grad]
2:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
2:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
2:                        [--log-timers-to-tensorboard]
2:                        [--log-batch-size-to-tensorboard]
2:                        [--no-log-learnig-rate-to-tensorboard]
2:                        [--no-log-loss-scale-to-tensorboard]
2:                        [--log-validation-ppl-to-tensorboard]
2:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
2:                        [--zero-contigious-gradients]
2:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
2:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
2:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
2:                        [--scattered-embeddings] [--split-transformers]
2:                        [--memory-centric-tiled-linear]
2:                        [--tile-factor TILE_FACTOR]
2:                        [--deepspeed-activation-checkpointing]
2:                        [--partition-activations] [--contigious-checkpointing]
2:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
2:                        [--profile-backward] [--deepspeed]
2:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
2:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
2: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
2:                        [--hidden-size HIDDEN_SIZE]
2:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
2:                        [--num-attention-heads NUM_ATTENTION_HEADS]
2:                        [--kv-channels KV_CHANNELS]
2:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
2:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
2:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
2:                        [--layernorm-epsilon LAYERNORM_EPSILON]
2:                        [--sync-tp-duplicated-parameters]
2:                        [--apply-residual-connection-post-layernorm]
2:                        [--embed-layernorm] [--openai-gelu]
2:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
2:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
2:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
2:                        [--kill-switch-path KILL_SWITCH_PATH]
2:                        [--log-level {debug,info,warning,error,critical}]
2:                        [--log-level-replica {debug,info,warning,error,critical}]
2:                        [--attention-dropout ATTENTION_DROPOUT]
2:                        [--hidden-dropout HIDDEN_DROPOUT]
2:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
2:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
2:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
2:                        [--micro-batch-size MICRO_BATCH_SIZE]
2:                        [--batch-size BATCH_SIZE]
2:                        [--global-batch-size GLOBAL_BATCH_SIZE]
2:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
2:                        [--checkpoint-activations]
2:                        [--distribute-checkpointed-activations]
2:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
2:                        [--train-iters TRAIN_ITERS]
2:                        [--train-samples TRAIN_SAMPLES]
2:                        [--train-tokens TRAIN_TOKENS]
2:                        [--log-interval LOG_INTERVAL]
2:                        [--exit-interval EXIT_INTERVAL]
2:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
2:                        [--tensorboard-dir TENSORBOARD_DIR]
2:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
2:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
2:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
2:                        [--use-bnb-optimizer]
2:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
2:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
2:                        [--eval-only EVAL_ONLY]
2:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
2:                        [--inference]
2:                        [--abort-on-unmet-fused-kernel-constraints]
2:                        [--pp-partition-method PP_PARTITION_METHOD]
2:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
2:                        [--init-method-xavier-uniform] [--lr LR]
2:                        [--lr-decay-style {constant,linear,cosine}]
2:                        [--lr-decay-iters LR_DECAY_ITERS]
2:                        [--lr-decay-samples LR_DECAY_SAMPLES]
2:                        [--lr-decay-tokens LR_DECAY_TOKENS]
2:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
2:                        [--lr-warmup-iters LR_WARMUP_ITERS]
2:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
2:                        [--warmup WARMUP] [--min-lr MIN_LR]
2:                        [--override-lr-scheduler]
2:                        [--use-checkpoint-lr-scheduler]
2:                        [--universal-checkpoint] [--save SAVE]
2:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
2:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
2:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
2:                        [--loss-scale LOSS_SCALE]
2:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
2:                        [--min-loss-scale MIN_LOSS_SCALE]
2:                        [--loss-scale-window LOSS_SCALE_WINDOW]
2:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
2:                        [--no-query-key-layer-scaling]
2:                        [--attention-softmax-in-fp32]
2:                        [--accumulate-allreduce-grads-in-fp32]
2:                        [--fp16-lm-cross-entropy]
2:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
2:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
2:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
2:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
2:                        [--distributed-backend {nccl,gloo}]
2:                        [--DDP-impl {local,torch}]
2:                        [--use-contiguous-buffers-in-ddp]
2:                        [--no-scatter-gather-tensors-in-pipeline]
2:                        [--local_rank LOCAL_RANK]
2:                        [--lazy-mpu-init LAZY_MPU_INIT]
2:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
2:                        [--eval-interval EVAL_INTERVAL]
2:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
2:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
2:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
2:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
2:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
2:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
2:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
2:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
2:                        [--merge-file MERGE_FILE]
2:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
2:                        [--seq-length SEQ_LENGTH]
2:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
2:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
2:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
2:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
2:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
2:                        [--num-workers NUM_WORKERS]
2:                        [--valid-num-workers VALID_NUM_WORKERS]
2:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
2:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
2:                        [--data-impl {lazy,cached,mmap,infer}]
2:                        [--reset-position-ids] [--reset-attention-mask]
2:                        [--eod-mask-loss] [--loss-on-targets-only]
2:                        [--reweight-loss-based-on-position-frequency]
2:                        [--noise-density NOISE_DENSITY]
2:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
2:                        [--adlr-autoresume]
2:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
2:                        [--ict-head-size ICT_HEAD_SIZE]
2:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
2:                        [--biencoder-shared-query-context-model]
2:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
2:                        [--titles-data-path TITLES_DATA_PATH]
2:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
2:                        [--use-one-sent-docs]
2:                        [--evidence-data-path EVIDENCE_DATA_PATH]
2:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
2:                        [--retriever-score-scaling]
2:                        [--block-data-path BLOCK_DATA_PATH]
2:                        [--embedding-path EMBEDDING_PATH]
2:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
2:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
2:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
2:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
2:                        [--log-params-norm] [--log-num-zeros-in-grad]
2:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
2:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
2:                        [--log-timers-to-tensorboard]
2:                        [--log-batch-size-to-tensorboard]
2:                        [--no-log-learnig-rate-to-tensorboard]
2:                        [--no-log-loss-scale-to-tensorboard]
2:                        [--log-validation-ppl-to-tensorboard]
2:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
2:                        [--zero-contigious-gradients]
2:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
2:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
2:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
2:                        [--scattered-embeddings] [--split-transformers]
2:                        [--memory-centric-tiled-linear]
2:                        [--tile-factor TILE_FACTOR]
2:                        [--deepspeed-activation-checkpointing]
2:                        [--partition-activations] [--contigious-checkpointing]
2:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
2:                        [--profile-backward] [--deepspeed]
2:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
2:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
2: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
2: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
5:                        [--hidden-size HIDDEN_SIZE]
5:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
5:                        [--num-attention-heads NUM_ATTENTION_HEADS]
5:                        [--kv-channels KV_CHANNELS]
5:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
5:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
5:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
5:                        [--layernorm-epsilon LAYERNORM_EPSILON]
5:                        [--sync-tp-duplicated-parameters]
5:                        [--apply-residual-connection-post-layernorm]
5:                        [--embed-layernorm] [--openai-gelu]
5:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
5:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
5:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
5:                        [--kill-switch-path KILL_SWITCH_PATH]
5:                        [--log-level {debug,info,warning,error,critical}]
5:                        [--log-level-replica {debug,info,warning,error,critical}]
5:                        [--attention-dropout ATTENTION_DROPOUT]
5:                        [--hidden-dropout HIDDEN_DROPOUT]
5:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
5:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
5:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
5:                        [--micro-batch-size MICRO_BATCH_SIZE]
5:                        [--batch-size BATCH_SIZE]
5:                        [--global-batch-size GLOBAL_BATCH_SIZE]
5:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
5:                        [--checkpoint-activations]
5:                        [--distribute-checkpointed-activations]
5:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
5:                        [--train-iters TRAIN_ITERS]
5:                        [--train-samples TRAIN_SAMPLES]
5:                        [--train-tokens TRAIN_TOKENS]
5:                        [--log-interval LOG_INTERVAL]
5:                        [--exit-interval EXIT_INTERVAL]
5:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
5:                        [--tensorboard-dir TENSORBOARD_DIR]
5:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
5:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
5:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
5:                        [--use-bnb-optimizer]
5:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
5:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
5:                        [--eval-only EVAL_ONLY]
5:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
5:                        [--inference]
5:                        [--abort-on-unmet-fused-kernel-constraints]
5:                        [--pp-partition-method PP_PARTITION_METHOD]
5:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
5:                        [--init-method-xavier-uniform] [--lr LR]
5:                        [--lr-decay-style {constant,linear,cosine}]
5:                        [--lr-decay-iters LR_DECAY_ITERS]
5:                        [--lr-decay-samples LR_DECAY_SAMPLES]
5:                        [--lr-decay-tokens LR_DECAY_TOKENS]
5:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
5:                        [--lr-warmup-iters LR_WARMUP_ITERS]
5:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
5:                        [--warmup WARMUP] [--min-lr MIN_LR]
5:                        [--override-lr-scheduler]
5:                        [--use-checkpoint-lr-scheduler]
5:                        [--universal-checkpoint] [--save SAVE]
5:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
5:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
5:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
5:                        [--loss-scale LOSS_SCALE]
5:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
5:                        [--min-loss-scale MIN_LOSS_SCALE]
5:                        [--loss-scale-window LOSS_SCALE_WINDOW]
5:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
5:                        [--no-query-key-layer-scaling]
5:                        [--attention-softmax-in-fp32]
5:                        [--accumulate-allreduce-grads-in-fp32]
5:                        [--fp16-lm-cross-entropy]
5:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
5:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
5:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
5:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
5:                        [--distributed-backend {nccl,gloo}]
5:                        [--DDP-impl {local,torch}]
5:                        [--use-contiguous-buffers-in-ddp]
5:                        [--no-scatter-gather-tensors-in-pipeline]
5:                        [--local_rank LOCAL_RANK]
5:                        [--lazy-mpu-init LAZY_MPU_INIT]
5:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
5:                        [--eval-interval EVAL_INTERVAL]
5:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
5:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
5:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
5:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
5:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
5:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
5:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
5:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
5:                        [--merge-file MERGE_FILE]
5:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
5:                        [--seq-length SEQ_LENGTH]
5:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
5:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
5:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
5:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
5:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
5:                        [--num-workers NUM_WORKERS]
5:                        [--valid-num-workers VALID_NUM_WORKERS]
5:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
5:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
5:                        [--data-impl {lazy,cached,mmap,infer}]
5:                        [--reset-position-ids] [--reset-attention-mask]
5:                        [--eod-mask-loss] [--loss-on-targets-only]
5:                        [--reweight-loss-based-on-position-frequency]
5:                        [--noise-density NOISE_DENSITY]
5:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
5:                        [--adlr-autoresume]
5:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
5:                        [--ict-head-size ICT_HEAD_SIZE]
5:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
5:                        [--biencoder-shared-query-context-model]
5:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
5:                        [--titles-data-path TITLES_DATA_PATH]
5:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
5:                        [--use-one-sent-docs]
5:                        [--evidence-data-path EVIDENCE_DATA_PATH]
5:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
5:                        [--retriever-score-scaling]
5:                        [--block-data-path BLOCK_DATA_PATH]
5:                        [--embedding-path EMBEDDING_PATH]
5:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
5:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
5:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
5:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
5:                        [--log-params-norm] [--log-num-zeros-in-grad]
5:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
5:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
5:                        [--log-timers-to-tensorboard]
5:                        [--log-batch-size-to-tensorboard]
5:                        [--no-log-learnig-rate-to-tensorboard]
5:                        [--no-log-loss-scale-to-tensorboard]
5:                        [--log-validation-ppl-to-tensorboard]
5:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
5:                        [--zero-contigious-gradients]
5:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
5:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
5:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
5:                        [--scattered-embeddings] [--split-transformers]
5:                        [--memory-centric-tiled-linear]
5:                        [--tile-factor TILE_FACTOR]
5:                        [--deepspeed-activation-checkpointing]
5:                        [--partition-activations] [--contigious-checkpointing]
5:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
5:                        [--profile-backward] [--deepspeed]
5:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
5:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
5: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
5:                        [--hidden-size HIDDEN_SIZE]
5:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
5:                        [--num-attention-heads NUM_ATTENTION_HEADS]
5:                        [--kv-channels KV_CHANNELS]
5:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
5:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
5:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
5:                        [--layernorm-epsilon LAYERNORM_EPSILON]
5:                        [--sync-tp-duplicated-parameters]
5:                        [--apply-residual-connection-post-layernorm]
5:                        [--embed-layernorm] [--openai-gelu]
5:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
5:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
5:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
5:                        [--kill-switch-path KILL_SWITCH_PATH]
5:                        [--log-level {debug,info,warning,error,critical}]
5:                        [--log-level-replica {debug,info,warning,error,critical}]
5:                        [--attention-dropout ATTENTION_DROPOUT]
5:                        [--hidden-dropout HIDDEN_DROPOUT]
5:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
5:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
5:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
5:                        [--micro-batch-size MICRO_BATCH_SIZE]
5:                        [--batch-size BATCH_SIZE]
5:                        [--global-batch-size GLOBAL_BATCH_SIZE]
5:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
5:                        [--checkpoint-activations]
5:                        [--distribute-checkpointed-activations]
5:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
5:                        [--train-iters TRAIN_ITERS]
5:                        [--train-samples TRAIN_SAMPLES]
5:                        [--train-tokens TRAIN_TOKENS]
5:                        [--log-interval LOG_INTERVAL]
5:                        [--exit-interval EXIT_INTERVAL]
5:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
5:                        [--tensorboard-dir TENSORBOARD_DIR]
5:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
5:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
5:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
5:                        [--use-bnb-optimizer]
5:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
5:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
5:                        [--eval-only EVAL_ONLY]
5:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
5:                        [--inference]
5:                        [--abort-on-unmet-fused-kernel-constraints]
5:                        [--pp-partition-method PP_PARTITION_METHOD]
5:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
5:                        [--init-method-xavier-uniform] [--lr LR]
5:                        [--lr-decay-style {constant,linear,cosine}]
5:                        [--lr-decay-iters LR_DECAY_ITERS]
5:                        [--lr-decay-samples LR_DECAY_SAMPLES]
5:                        [--lr-decay-tokens LR_DECAY_TOKENS]
5:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
5:                        [--lr-warmup-iters LR_WARMUP_ITERS]
5:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
5:                        [--warmup WARMUP] [--min-lr MIN_LR]
5:                        [--override-lr-scheduler]
5:                        [--use-checkpoint-lr-scheduler]
5:                        [--universal-checkpoint] [--save SAVE]
5:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
5:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
5:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
5:                        [--loss-scale LOSS_SCALE]
5:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
5:                        [--min-loss-scale MIN_LOSS_SCALE]
5:                        [--loss-scale-window LOSS_SCALE_WINDOW]
5:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
5:                        [--no-query-key-layer-scaling]
5:                        [--attention-softmax-in-fp32]
5:                        [--accumulate-allreduce-grads-in-fp32]
5:                        [--fp16-lm-cross-entropy]
5:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
5:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
5:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
5:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
5:                        [--distributed-backend {nccl,gloo}]
5:                        [--DDP-impl {local,torch}]
5:                        [--use-contiguous-buffers-in-ddp]
5:                        [--no-scatter-gather-tensors-in-pipeline]
5:                        [--local_rank LOCAL_RANK]
5:                        [--lazy-mpu-init LAZY_MPU_INIT]
5:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
5:                        [--eval-interval EVAL_INTERVAL]
5:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
5:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
5:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
5:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
5:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
5:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
5:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
5:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
5:                        [--merge-file MERGE_FILE]
5:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
5:                        [--seq-length SEQ_LENGTH]
5:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
5:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
5:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
5:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
5:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
5:                        [--num-workers NUM_WORKERS]
5:                        [--valid-num-workers VALID_NUM_WORKERS]
5:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
5:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
5:                        [--data-impl {lazy,cached,mmap,infer}]
5:                        [--reset-position-ids] [--reset-attention-mask]
5:                        [--eod-mask-loss] [--loss-on-targets-only]
5:                        [--reweight-loss-based-on-position-frequency]
5:                        [--noise-density NOISE_DENSITY]
5:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
5:                        [--adlr-autoresume]
5:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
5:                        [--ict-head-size ICT_HEAD_SIZE]
5:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
5:                        [--biencoder-shared-query-context-model]
5:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
5:                        [--titles-data-path TITLES_DATA_PATH]
5:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
5:                        [--use-one-sent-docs]
5:                        [--evidence-data-path EVIDENCE_DATA_PATH]
5:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
5:                        [--retriever-score-scaling]
5:                        [--block-data-path BLOCK_DATA_PATH]
5:                        [--embedding-path EMBEDDING_PATH]
5:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
5:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
5:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
5:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
5:                        [--log-params-norm] [--log-num-zeros-in-grad]
5:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
5:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
5:                        [--log-timers-to-tensorboard]
5:                        [--log-batch-size-to-tensorboard]
5:                        [--no-log-learnig-rate-to-tensorboard]
5:                        [--no-log-loss-scale-to-tensorboard]
5:                        [--log-validation-ppl-to-tensorboard]
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
5:                        [--zero-contigious-gradients]
5:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
5:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
5:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
5:                        [--scattered-embeddings] [--split-transformers]
5:                        [--memory-centric-tiled-linear]
5:                        [--tile-factor TILE_FACTOR]
5:                        [--deepspeed-activation-checkpointing]
5:                        [--partition-activations] [--contigious-checkpointing]
5:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
5:                        [--profile-backward] [--deepspeed]
5:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
5:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
5: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
5: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
5: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
5:                        [--hidden-size HIDDEN_SIZE]
5:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
5:                        [--num-attention-heads NUM_ATTENTION_HEADS]
5:                        [--kv-channels KV_CHANNELS]
5:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
5:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
5:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
5:                        [--layernorm-epsilon LAYERNORM_EPSILON]
5:                        [--sync-tp-duplicated-parameters]
5:                        [--apply-residual-connection-post-layernorm]
5:                        [--embed-layernorm] [--openai-gelu]
5:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
5:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
5:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
5:                        [--kill-switch-path KILL_SWITCH_PATH]
5:                        [--log-level {debug,info,warning,error,critical}]
5:                        [--log-level-replica {debug,info,warning,error,critical}]
5:                        [--attention-dropout ATTENTION_DROPOUT]
5:                        [--hidden-dropout HIDDEN_DROPOUT]
5:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
5:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
5:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
5:                        [--micro-batch-size MICRO_BATCH_SIZE]
5:                        [--batch-size BATCH_SIZE]
5:                        [--global-batch-size GLOBAL_BATCH_SIZE]
5:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
5:                        [--checkpoint-activations]
5:                        [--distribute-checkpointed-activations]
5:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
5:                        [--train-iters TRAIN_ITERS]
5:                        [--train-samples TRAIN_SAMPLES]
5:                        [--train-tokens TRAIN_TOKENS]
5:                        [--log-interval LOG_INTERVAL]
5:                        [--exit-interval EXIT_INTERVAL]
5:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
5:                        [--tensorboard-dir TENSORBOARD_DIR]
5:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
5:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
5:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
5:                        [--use-bnb-optimizer]
5:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
5:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
5:                        [--eval-only EVAL_ONLY]
5:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
5:                        [--inference]
5:                        [--abort-on-unmet-fused-kernel-constraints]
5:                        [--pp-partition-method PP_PARTITION_METHOD]
5:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
5:                        [--init-method-xavier-uniform] [--lr LR]
5:                        [--lr-decay-style {constant,linear,cosine}]
5:                        [--lr-decay-iters LR_DECAY_ITERS]
5:                        [--lr-decay-samples LR_DECAY_SAMPLES]
5:                        [--lr-decay-tokens LR_DECAY_TOKENS]
5:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
5:                        [--lr-warmup-iters LR_WARMUP_ITERS]
5:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
5:                        [--warmup WARMUP] [--min-lr MIN_LR]
5:                        [--override-lr-scheduler]
5:                        [--use-checkpoint-lr-scheduler]
5:                        [--universal-checkpoint] [--save SAVE]
5:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
5:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
5:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
5:                        [--loss-scale LOSS_SCALE]
5:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
5:                        [--min-loss-scale MIN_LOSS_SCALE]
5:                        [--loss-scale-window LOSS_SCALE_WINDOW]
5:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
5:                        [--no-query-key-layer-scaling]
5:                        [--attention-softmax-in-fp32]
5:                        [--accumulate-allreduce-grads-in-fp32]
5:                        [--fp16-lm-cross-entropy]
5:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
5:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
5:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
5:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
5:                        [--distributed-backend {nccl,gloo}]
5:                        [--DDP-impl {local,torch}]
5:                        [--use-contiguous-buffers-in-ddp]
5:                        [--no-scatter-gather-tensors-in-pipeline]
5:                        [--local_rank LOCAL_RANK]
5:                        [--lazy-mpu-init LAZY_MPU_INIT]
5:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
5:                        [--eval-interval EVAL_INTERVAL]
5:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
5:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
5:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
5:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
5:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
5:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
5:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
5:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
5:                        [--merge-file MERGE_FILE]
5:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
5:                        [--seq-length SEQ_LENGTH]
5:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
5:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
5:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
5:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
5:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
5:                        [--num-workers NUM_WORKERS]
5:                        [--valid-num-workers VALID_NUM_WORKERS]
5:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
5:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
5:                        [--data-impl {lazy,cached,mmap,infer}]
5:                        [--reset-position-ids] [--reset-attention-mask]
5:                        [--eod-mask-loss] [--loss-on-targets-only]
5:                        [--reweight-loss-based-on-position-frequency]
5:                        [--noise-density NOISE_DENSITY]
5:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
5:                        [--adlr-autoresume]
5:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
5:                        [--ict-head-size ICT_HEAD_SIZE]
5:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
5:                        [--biencoder-shared-query-context-model]
5:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
5:                        [--titles-data-path TITLES_DATA_PATH]
5:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
5:                        [--use-one-sent-docs]
5:                        [--evidence-data-path EVIDENCE_DATA_PATH]
5:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
5:                        [--retriever-score-scaling]
5:                        [--block-data-path BLOCK_DATA_PATH]
5:                        [--embedding-path EMBEDDING_PATH]
5:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
5:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
5:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
5:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
5:                        [--log-params-norm] [--log-num-zeros-in-grad]
5:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
5:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
5:                        [--log-timers-to-tensorboard]
5:                        [--log-batch-size-to-tensorboard]
5:                        [--no-log-learnig-rate-to-tensorboard]
5:                        [--no-log-loss-scale-to-tensorboard]
5:                        [--log-validation-ppl-to-tensorboard]
5:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
5:                        [--zero-contigious-gradients]
5:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
5:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
5:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
5:                        [--scattered-embeddings] [--split-transformers]
5:                        [--memory-centric-tiled-linear]
5:                        [--tile-factor TILE_FACTOR]
5:                        [--deepspeed-activation-checkpointing]
5:                        [--partition-activations] [--contigious-checkpointing]
5:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
5:                        [--profile-backward] [--deepspeed]
5:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
5:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
5: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
5: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
5:                        [--hidden-size HIDDEN_SIZE]
5:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
5:                        [--num-attention-heads NUM_ATTENTION_HEADS]
5:                        [--kv-channels KV_CHANNELS]
5:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
5:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
5:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
5:                        [--layernorm-epsilon LAYERNORM_EPSILON]
5:                        [--sync-tp-duplicated-parameters]
5:                        [--apply-residual-connection-post-layernorm]
5:                        [--embed-layernorm] [--openai-gelu]
5:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
5:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
5:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
5:                        [--kill-switch-path KILL_SWITCH_PATH]
5:                        [--log-level {debug,info,warning,error,critical}]
5:                        [--log-level-replica {debug,info,warning,error,critical}]
5:                        [--attention-dropout ATTENTION_DROPOUT]
5:                        [--hidden-dropout HIDDEN_DROPOUT]
5:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
5:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
5:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
5:                        [--micro-batch-size MICRO_BATCH_SIZE]
5:                        [--batch-size BATCH_SIZE]
5:                        [--global-batch-size GLOBAL_BATCH_SIZE]
5:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
5:                        [--checkpoint-activations]
5:                        [--distribute-checkpointed-activations]
5:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
5:                        [--train-iters TRAIN_ITERS]
5:                        [--train-samples TRAIN_SAMPLES]
5:                        [--train-tokens TRAIN_TOKENS]
5:                        [--log-interval LOG_INTERVAL]
5:                        [--exit-interval EXIT_INTERVAL]
5:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
5:                        [--tensorboard-dir TENSORBOARD_DIR]
5:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
5:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
5:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
5:                        [--use-bnb-optimizer]
5:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
5:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
5:                        [--eval-only EVAL_ONLY]
5:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
5:                        [--inference]
5:                        [--abort-on-unmet-fused-kernel-constraints]
5:                        [--pp-partition-method PP_PARTITION_METHOD]
5:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
5:                        [--init-method-xavier-uniform] [--lr LR]
5:                        [--lr-decay-style {constant,linear,cosine}]
5:                        [--lr-decay-iters LR_DECAY_ITERS]
5:                        [--lr-decay-samples LR_DECAY_SAMPLES]
5:                        [--lr-decay-tokens LR_DECAY_TOKENS]
5:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
5:                        [--lr-warmup-iters LR_WARMUP_ITERS]
5:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
5:                        [--warmup WARMUP] [--min-lr MIN_LR]
5:                        [--override-lr-scheduler]
5:                        [--use-checkpoint-lr-scheduler]
5:                        [--universal-checkpoint] [--save SAVE]
5:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
5:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
5:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
5:                        [--loss-scale LOSS_SCALE]
5:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
5:                        [--min-loss-scale MIN_LOSS_SCALE]
5:                        [--loss-scale-window LOSS_SCALE_WINDOW]
5:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
5:                        [--no-query-key-layer-scaling]
5:                        [--attention-softmax-in-fp32]
5:                        [--accumulate-allreduce-grads-in-fp32]
5:                        [--fp16-lm-cross-entropy]
5:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
5:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
5:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
5:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
5:                        [--distributed-backend {nccl,gloo}]
5:                        [--DDP-impl {local,torch}]
5:                        [--use-contiguous-buffers-in-ddp]
5:                        [--no-scatter-gather-tensors-in-pipeline]
5:                        [--local_rank LOCAL_RANK]
5:                        [--lazy-mpu-init LAZY_MPU_INIT]
5:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
5:                        [--eval-interval EVAL_INTERVAL]
5:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
5:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
5:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
5:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
5:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
5:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
5:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
5:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
5:                        [--merge-file MERGE_FILE]
5:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
5:                        [--seq-length SEQ_LENGTH]
5:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
5:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
5:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
5:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
5:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
5:                        [--num-workers NUM_WORKERS]
5:                        [--valid-num-workers VALID_NUM_WORKERS]
5:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
5:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
5:                        [--data-impl {lazy,cached,mmap,infer}]
5:                        [--reset-position-ids] [--reset-attention-mask]
5:                        [--eod-mask-loss] [--loss-on-targets-only]
5:                        [--reweight-loss-based-on-position-frequency]
5:                        [--noise-density NOISE_DENSITY]
5:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
5:                        [--adlr-autoresume]
5:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
5:                        [--ict-head-size ICT_HEAD_SIZE]
5:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
5:                        [--biencoder-shared-query-context-model]
5:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
5:                        [--titles-data-path TITLES_DATA_PATH]
5:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
5:                        [--use-one-sent-docs]
5:                        [--evidence-data-path EVIDENCE_DATA_PATH]
5:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
5:                        [--retriever-score-scaling]
5:                        [--block-data-path BLOCK_DATA_PATH]
5:                        [--embedding-path EMBEDDING_PATH]
5:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
5:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
5:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
5:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
5:                        [--log-params-norm] [--log-num-zeros-in-grad]
5:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
5:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
5:                        [--log-timers-to-tensorboard]
5:                        [--log-batch-size-to-tensorboard]
5:                        [--no-log-learnig-rate-to-tensorboard]
5:                        [--no-log-loss-scale-to-tensorboard]
5:                        [--log-validation-ppl-to-tensorboard]
5:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
5:                        [--zero-contigious-gradients]
5:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
5:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
5:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
5:                        [--scattered-embeddings] [--split-transformers]
5:                        [--memory-centric-tiled-linear]
5:                        [--tile-factor TILE_FACTOR]
5:                        [--deepspeed-activation-checkpointing]
5:                        [--partition-activations] [--contigious-checkpointing]
5:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
5:                        [--profile-backward] [--deepspeed]
5:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
5:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
5: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
5: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
5:                        [--hidden-size HIDDEN_SIZE]
5:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
5:                        [--num-attention-heads NUM_ATTENTION_HEADS]
5:                        [--kv-channels KV_CHANNELS]
5:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
5:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
5:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
5:                        [--layernorm-epsilon LAYERNORM_EPSILON]
5:                        [--sync-tp-duplicated-parameters]
5:                        [--apply-residual-connection-post-layernorm]
5:                        [--embed-layernorm] [--openai-gelu]
5:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
5:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
5:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
5:                        [--kill-switch-path KILL_SWITCH_PATH]
5:                        [--log-level {debug,info,warning,error,critical}]
5:                        [--log-level-replica {debug,info,warning,error,critical}]
5:                        [--attention-dropout ATTENTION_DROPOUT]
5:                        [--hidden-dropout HIDDEN_DROPOUT]
5:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
5:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
5:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
5:                        [--micro-batch-size MICRO_BATCH_SIZE]
5:                        [--batch-size BATCH_SIZE]
5:                        [--global-batch-size GLOBAL_BATCH_SIZE]
5:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
5:                        [--checkpoint-activations]
5:                        [--distribute-checkpointed-activations]
5:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
5:                        [--train-iters TRAIN_ITERS]
5:                        [--train-samples TRAIN_SAMPLES]
5:                        [--train-tokens TRAIN_TOKENS]
5:                        [--log-interval LOG_INTERVAL]
5:                        [--exit-interval EXIT_INTERVAL]
5:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
5:                        [--tensorboard-dir TENSORBOARD_DIR]
5:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
5:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
5:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
5:                        [--use-bnb-optimizer]
5:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
5:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
5:                        [--eval-only EVAL_ONLY]
5:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
5:                        [--inference]
5:                        [--abort-on-unmet-fused-kernel-constraints]
5:                        [--pp-partition-method PP_PARTITION_METHOD]
5:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
5:                        [--init-method-xavier-uniform] [--lr LR]
5:                        [--lr-decay-style {constant,linear,cosine}]
5:                        [--lr-decay-iters LR_DECAY_ITERS]
5:                        [--lr-decay-samples LR_DECAY_SAMPLES]
5:                        [--lr-decay-tokens LR_DECAY_TOKENS]
5:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
5:                        [--lr-warmup-iters LR_WARMUP_ITERS]
5:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
5:                        [--warmup WARMUP] [--min-lr MIN_LR]
5:                        [--override-lr-scheduler]
5:                        [--use-checkpoint-lr-scheduler]
5:                        [--universal-checkpoint] [--save SAVE]
5:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
5:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
5:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
5:                        [--loss-scale LOSS_SCALE]
5:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
5:                        [--min-loss-scale MIN_LOSS_SCALE]
5:                        [--loss-scale-window LOSS_SCALE_WINDOW]
5:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
5:                        [--no-query-key-layer-scaling]
5:                        [--attention-softmax-in-fp32]
5:                        [--accumulate-allreduce-grads-in-fp32]
5:                        [--fp16-lm-cross-entropy]
5:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
5:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
5:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
5:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
5:                        [--distributed-backend {nccl,gloo}]
5:                        [--DDP-impl {local,torch}]
5:                        [--use-contiguous-buffers-in-ddp]
5:                        [--no-scatter-gather-tensors-in-pipeline]
5:                        [--local_rank LOCAL_RANK]
5:                        [--lazy-mpu-init LAZY_MPU_INIT]
5:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
5:                        [--eval-interval EVAL_INTERVAL]
5:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
5:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
5:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
5:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
5:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
5:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
5:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
5:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
5:                        [--merge-file MERGE_FILE]
5:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
5:                        [--seq-length SEQ_LENGTH]
5:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
5:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
5:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
5:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
5:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
5:                        [--num-workers NUM_WORKERS]
5:                        [--valid-num-workers VALID_NUM_WORKERS]
5:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
5:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
5:                        [--data-impl {lazy,cached,mmap,infer}]
5:                        [--reset-position-ids] [--reset-attention-mask]
5:                        [--eod-mask-loss] [--loss-on-targets-only]
5:                        [--reweight-loss-based-on-position-frequency]
5:                        [--noise-density NOISE_DENSITY]
5:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
5:                        [--adlr-autoresume]
5:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
5:                        [--ict-head-size ICT_HEAD_SIZE]
5:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
5:                        [--biencoder-shared-query-context-model]
5:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
5:                        [--titles-data-path TITLES_DATA_PATH]
5:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
5:                        [--use-one-sent-docs]
5:                        [--evidence-data-path EVIDENCE_DATA_PATH]
5:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
5:                        [--retriever-score-scaling]
5:                        [--block-data-path BLOCK_DATA_PATH]
5:                        [--embedding-path EMBEDDING_PATH]
5:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
5:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
5:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
5:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
5:                        [--log-params-norm] [--log-num-zeros-in-grad]
5:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
5:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
5:                        [--log-timers-to-tensorboard]
5:                        [--log-batch-size-to-tensorboard]
5:                        [--no-log-learnig-rate-to-tensorboard]
5:                        [--no-log-loss-scale-to-tensorboard]
5:                        [--log-validation-ppl-to-tensorboard]
5:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
5:                        [--zero-contigious-gradients]
5:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
5:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
5:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
5:                        [--scattered-embeddings] [--split-transformers]
5:                        [--memory-centric-tiled-linear]
5:                        [--tile-factor TILE_FACTOR]
5:                        [--deepspeed-activation-checkpointing]
5:                        [--partition-activations] [--contigious-checkpointing]
5:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
5:                        [--profile-backward] [--deepspeed]
5:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
5:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
5: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
3:                        [--hidden-size HIDDEN_SIZE]
3:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
3:                        [--num-attention-heads NUM_ATTENTION_HEADS]
3:                        [--kv-channels KV_CHANNELS]
3:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
3:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
3:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
3:                        [--layernorm-epsilon LAYERNORM_EPSILON]
3:                        [--sync-tp-duplicated-parameters]
3:                        [--apply-residual-connection-post-layernorm]
3:                        [--embed-layernorm] [--openai-gelu]
3:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
3:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
3:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
3:                        [--kill-switch-path KILL_SWITCH_PATH]
3:                        [--log-level {debug,info,warning,error,critical}]
3:                        [--log-level-replica {debug,info,warning,error,critical}]
3:                        [--attention-dropout ATTENTION_DROPOUT]
3:                        [--hidden-dropout HIDDEN_DROPOUT]
3:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
3:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
3:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
3:                        [--micro-batch-size MICRO_BATCH_SIZE]
3:                        [--batch-size BATCH_SIZE]
3:                        [--global-batch-size GLOBAL_BATCH_SIZE]
3:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
3:                        [--checkpoint-activations]
3:                        [--distribute-checkpointed-activations]
3:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
3:                        [--train-iters TRAIN_ITERS]
3:                        [--train-samples TRAIN_SAMPLES]
3:                        [--train-tokens TRAIN_TOKENS]
3:                        [--log-interval LOG_INTERVAL]
3:                        [--exit-interval EXIT_INTERVAL]
3:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
3:                        [--tensorboard-dir TENSORBOARD_DIR]
3:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
3:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
3:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
3:                        [--use-bnb-optimizer]
3:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
3:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
3:                        [--eval-only EVAL_ONLY]
3:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
3:                        [--inference]
3:                        [--abort-on-unmet-fused-kernel-constraints]
3:                        [--pp-partition-method PP_PARTITION_METHOD]
3:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
3:                        [--init-method-xavier-uniform] [--lr LR]
3:                        [--lr-decay-style {constant,linear,cosine}]
3:                        [--lr-decay-iters LR_DECAY_ITERS]
3:                        [--lr-decay-samples LR_DECAY_SAMPLES]
3:                        [--lr-decay-tokens LR_DECAY_TOKENS]
3:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
3:                        [--lr-warmup-iters LR_WARMUP_ITERS]
3:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
3:                        [--warmup WARMUP] [--min-lr MIN_LR]
3:                        [--override-lr-scheduler]
3:                        [--use-checkpoint-lr-scheduler]
3:                        [--universal-checkpoint] [--save SAVE]
3:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
3:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
3:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
3:                        [--loss-scale LOSS_SCALE]
3:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
3:                        [--min-loss-scale MIN_LOSS_SCALE]
3:                        [--loss-scale-window LOSS_SCALE_WINDOW]
3:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
3:                        [--no-query-key-layer-scaling]
3:                        [--attention-softmax-in-fp32]
3:                        [--accumulate-allreduce-grads-in-fp32]
3:                        [--fp16-lm-cross-entropy]
3:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
3:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
3:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
3:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
3:                        [--distributed-backend {nccl,gloo}]
3:                        [--DDP-impl {local,torch}]
3:                        [--use-contiguous-buffers-in-ddp]
3:                        [--no-scatter-gather-tensors-in-pipeline]
3:                        [--local_rank LOCAL_RANK]
3:                        [--lazy-mpu-init LAZY_MPU_INIT]
3:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
3:                        [--eval-interval EVAL_INTERVAL]
3:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
3:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
3:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
3:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
3:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
3:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
3:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
3:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
3:                        [--merge-file MERGE_FILE]
3:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
3:                        [--seq-length SEQ_LENGTH]
3:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
3:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
3:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
3:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
3:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
3:                        [--num-workers NUM_WORKERS]
3:                        [--valid-num-workers VALID_NUM_WORKERS]
3:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
3:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
3:                        [--data-impl {lazy,cached,mmap,infer}]
3:                        [--reset-position-ids] [--reset-attention-mask]
3:                        [--eod-mask-loss] [--loss-on-targets-only]
3:                        [--reweight-loss-based-on-position-frequency]
3:                        [--noise-density NOISE_DENSITY]
3:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
3:                        [--adlr-autoresume]
3:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
3:                        [--ict-head-size ICT_HEAD_SIZE]
3:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
3:                        [--biencoder-shared-query-context-model]
3:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
3:                        [--titles-data-path TITLES_DATA_PATH]
3:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
3:                        [--use-one-sent-docs]
3:                        [--evidence-data-path EVIDENCE_DATA_PATH]
3:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
3:                        [--retriever-score-scaling]
3:                        [--block-data-path BLOCK_DATA_PATH]
3:                        [--embedding-path EMBEDDING_PATH]
3:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
3:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
3:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
3:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
3:                        [--log-params-norm] [--log-num-zeros-in-grad]
3:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
3:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
3:                        [--log-timers-to-tensorboard]
3:                        [--log-batch-size-to-tensorboard]
3:                        [--no-log-learnig-rate-to-tensorboard]
3:                        [--no-log-loss-scale-to-tensorboard]
3:                        [--log-validation-ppl-to-tensorboard]
3:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
3:                        [--zero-contigious-gradients]
3:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
3:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
3:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
3:                        [--scattered-embeddings] [--split-transformers]
3:                        [--memory-centric-tiled-linear]
3:                        [--tile-factor TILE_FACTOR]
3:                        [--deepspeed-activation-checkpointing]
3:                        [--partition-activations] [--contigious-checkpointing]
3:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
3:                        [--profile-backward] [--deepspeed]
3:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
3:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
3: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
3: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
3:                        [--hidden-size HIDDEN_SIZE]
3:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
3:                        [--num-attention-heads NUM_ATTENTION_HEADS]
3:                        [--kv-channels KV_CHANNELS]
3:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
3:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
3:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
3:                        [--layernorm-epsilon LAYERNORM_EPSILON]
3:                        [--sync-tp-duplicated-parameters]
3:                        [--apply-residual-connection-post-layernorm]
3:                        [--embed-layernorm] [--openai-gelu]
3:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
3:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
3:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
3:                        [--kill-switch-path KILL_SWITCH_PATH]
3:                        [--log-level {debug,info,warning,error,critical}]
3:                        [--log-level-replica {debug,info,warning,error,critical}]
3:                        [--attention-dropout ATTENTION_DROPOUT]
3:                        [--hidden-dropout HIDDEN_DROPOUT]
3:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
3:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
3:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
3:                        [--micro-batch-size MICRO_BATCH_SIZE]
3:                        [--batch-size BATCH_SIZE]
3:                        [--global-batch-size GLOBAL_BATCH_SIZE]
3:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
3:                        [--checkpoint-activations]
3:                        [--distribute-checkpointed-activations]
3:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
3:                        [--train-iters TRAIN_ITERS]
3:                        [--train-samples TRAIN_SAMPLES]
3:                        [--train-tokens TRAIN_TOKENS]
3:                        [--log-interval LOG_INTERVAL]
3:                        [--exit-interval EXIT_INTERVAL]
3:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
3:                        [--tensorboard-dir TENSORBOARD_DIR]
3:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
3:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
3:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
3:                        [--use-bnb-optimizer]
3:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
3:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
3:                        [--eval-only EVAL_ONLY]
3:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
3:                        [--inference]
3:                        [--abort-on-unmet-fused-kernel-constraints]
3:                        [--pp-partition-method PP_PARTITION_METHOD]
3:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
3:                        [--init-method-xavier-uniform] [--lr LR]
3:                        [--lr-decay-style {constant,linear,cosine}]
3:                        [--lr-decay-iters LR_DECAY_ITERS]
3:                        [--lr-decay-samples LR_DECAY_SAMPLES]
3:                        [--lr-decay-tokens LR_DECAY_TOKENS]
3:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
3:                        [--lr-warmup-iters LR_WARMUP_ITERS]
3:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
3:                        [--warmup WARMUP] [--min-lr MIN_LR]
3:                        [--override-lr-scheduler]
3:                        [--use-checkpoint-lr-scheduler]
3:                        [--universal-checkpoint] [--save SAVE]
3:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
3:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
3:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
3:                        [--loss-scale LOSS_SCALE]
3:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
3:                        [--min-loss-scale MIN_LOSS_SCALE]
3:                        [--loss-scale-window LOSS_SCALE_WINDOW]
3:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
3:                        [--no-query-key-layer-scaling]
3:                        [--attention-softmax-in-fp32]
3:                        [--accumulate-allreduce-grads-in-fp32]
3:                        [--fp16-lm-cross-entropy]
3:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
3:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
3:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
3:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
3:                        [--distributed-backend {nccl,gloo}]
3:                        [--DDP-impl {local,torch}]
3:                        [--use-contiguous-buffers-in-ddp]
3:                        [--no-scatter-gather-tensors-in-pipeline]
3:                        [--local_rank LOCAL_RANK]
3:                        [--lazy-mpu-init LAZY_MPU_INIT]
3:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
3:                        [--eval-interval EVAL_INTERVAL]
3:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
3:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
3:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
3:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
3:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
3:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
3:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
3:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
3:                        [--merge-file MERGE_FILE]
3:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
3:                        [--seq-length SEQ_LENGTH]
3:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
3:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
3:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
3:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
3:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
3:                        [--num-workers NUM_WORKERS]
3:                        [--valid-num-workers VALID_NUM_WORKERS]
3:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
3:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
3:                        [--data-impl {lazy,cached,mmap,infer}]
3:                        [--reset-position-ids] [--reset-attention-mask]
3:                        [--eod-mask-loss] [--loss-on-targets-only]
3:                        [--reweight-loss-based-on-position-frequency]
3:                        [--noise-density NOISE_DENSITY]
3:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
3:                        [--adlr-autoresume]
3:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
3:                        [--ict-head-size ICT_HEAD_SIZE]
3:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
3:                        [--biencoder-shared-query-context-model]
3:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
3:                        [--titles-data-path TITLES_DATA_PATH]
3:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
3:                        [--use-one-sent-docs]
3:                        [--evidence-data-path EVIDENCE_DATA_PATH]
3:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
3:                        [--retriever-score-scaling]
3:                        [--block-data-path BLOCK_DATA_PATH]
3:                        [--embedding-path EMBEDDING_PATH]
3:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
3:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
3:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
3:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
3:                        [--log-params-norm] [--log-num-zeros-in-grad]
3:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
3:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
3:                        [--log-timers-to-tensorboard]
3:                        [--log-batch-size-to-tensorboard]
3:                        [--no-log-learnig-rate-to-tensorboard]
3:                        [--no-log-loss-scale-to-tensorboard]
3:                        [--log-validation-ppl-to-tensorboard]
3:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
3:                        [--zero-contigious-gradients]
3:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
3:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
3:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
3:                        [--scattered-embeddings] [--split-transformers]
3:                        [--memory-centric-tiled-linear]
3:                        [--tile-factor TILE_FACTOR]
3:                        [--deepspeed-activation-checkpointing]
3:                        [--partition-activations] [--contigious-checkpointing]
3:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
3:                        [--profile-backward] [--deepspeed]
3:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
3:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
3: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
3: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
3:                        [--hidden-size HIDDEN_SIZE]
3:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
3:                        [--num-attention-heads NUM_ATTENTION_HEADS]
3:                        [--kv-channels KV_CHANNELS]
3:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
3:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
3:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
3:                        [--layernorm-epsilon LAYERNORM_EPSILON]
3:                        [--sync-tp-duplicated-parameters]
3:                        [--apply-residual-connection-post-layernorm]
3:                        [--embed-layernorm] [--openai-gelu]
3:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
3:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
3:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
3:                        [--kill-switch-path KILL_SWITCH_PATH]
3:                        [--log-level {debug,info,warning,error,critical}]
3:                        [--log-level-replica {debug,info,warning,error,critical}]
3:                        [--attention-dropout ATTENTION_DROPOUT]
3:                        [--hidden-dropout HIDDEN_DROPOUT]
3:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
3:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
3:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
3:                        [--micro-batch-size MICRO_BATCH_SIZE]
3:                        [--batch-size BATCH_SIZE]
3:                        [--global-batch-size GLOBAL_BATCH_SIZE]
3:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
3:                        [--checkpoint-activations]
3:                        [--distribute-checkpointed-activations]
3:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
3:                        [--train-iters TRAIN_ITERS]
3:                        [--train-samples TRAIN_SAMPLES]
3:                        [--train-tokens TRAIN_TOKENS]
3:                        [--log-interval LOG_INTERVAL]
3:                        [--exit-interval EXIT_INTERVAL]
3:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
3:                        [--tensorboard-dir TENSORBOARD_DIR]
3:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
3:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
3:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
3:                        [--use-bnb-optimizer]
3:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
3:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
3:                        [--eval-only EVAL_ONLY]
3:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
3:                        [--inference]
3:                        [--abort-on-unmet-fused-kernel-constraints]
3:                        [--pp-partition-method PP_PARTITION_METHOD]
3:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
3:                        [--init-method-xavier-uniform] [--lr LR]
3:                        [--lr-decay-style {constant,linear,cosine}]
3:                        [--lr-decay-iters LR_DECAY_ITERS]
3:                        [--lr-decay-samples LR_DECAY_SAMPLES]
3:                        [--lr-decay-tokens LR_DECAY_TOKENS]
3:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
3:                        [--lr-warmup-iters LR_WARMUP_ITERS]
3:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
3:                        [--warmup WARMUP] [--min-lr MIN_LR]
3:                        [--override-lr-scheduler]
3:                        [--use-checkpoint-lr-scheduler]
3:                        [--universal-checkpoint] [--save SAVE]
3:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
3:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
3:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
3:                        [--loss-scale LOSS_SCALE]
3:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
3:                        [--min-loss-scale MIN_LOSS_SCALE]
3:                        [--loss-scale-window LOSS_SCALE_WINDOW]
3:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
3:                        [--no-query-key-layer-scaling]
3:                        [--attention-softmax-in-fp32]
3:                        [--accumulate-allreduce-grads-in-fp32]
3:                        [--fp16-lm-cross-entropy]
3:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
3:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
3:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
3:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
3:                        [--distributed-backend {nccl,gloo}]
3:                        [--DDP-impl {local,torch}]
3:                        [--use-contiguous-buffers-in-ddp]
3:                        [--no-scatter-gather-tensors-in-pipeline]
3:                        [--local_rank LOCAL_RANK]
3:                        [--lazy-mpu-init LAZY_MPU_INIT]
3:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
3:                        [--eval-interval EVAL_INTERVAL]
3:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
3:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
3:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
3:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
3:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
3:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
3:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
3:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
3:                        [--merge-file MERGE_FILE]
3:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
3:                        [--seq-length SEQ_LENGTH]
3:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
3:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
3:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
3:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
3:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
3:                        [--num-workers NUM_WORKERS]
3:                        [--valid-num-workers VALID_NUM_WORKERS]
3:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
3:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
3:                        [--data-impl {lazy,cached,mmap,infer}]
3:                        [--reset-position-ids] [--reset-attention-mask]
3:                        [--eod-mask-loss] [--loss-on-targets-only]
3:                        [--reweight-loss-based-on-position-frequency]
3:                        [--noise-density NOISE_DENSITY]
3:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
3:                        [--adlr-autoresume]
3:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
3:                        [--ict-head-size ICT_HEAD_SIZE]
3:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
3:                        [--biencoder-shared-query-context-model]
3:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
3:                        [--titles-data-path TITLES_DATA_PATH]
3:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
3:                        [--use-one-sent-docs]
3:                        [--evidence-data-path EVIDENCE_DATA_PATH]
3:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
3:                        [--retriever-score-scaling]
3:                        [--block-data-path BLOCK_DATA_PATH]
3:                        [--embedding-path EMBEDDING_PATH]
3:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
3:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
3:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
3:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
3:                        [--log-params-norm] [--log-num-zeros-in-grad]
3:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
3:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
3:                        [--log-timers-to-tensorboard]
3:                        [--log-batch-size-to-tensorboard]
3:                        [--no-log-learnig-rate-to-tensorboard]
3:                        [--no-log-loss-scale-to-tensorboard]
3:                        [--log-validation-ppl-to-tensorboard]
3:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
3:                        [--zero-contigious-gradients]
3:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
3:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
3:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
3:                        [--scattered-embeddings] [--split-transformers]
3:                        [--memory-centric-tiled-linear]
3:                        [--tile-factor TILE_FACTOR]
3:                        [--deepspeed-activation-checkpointing]
3:                        [--partition-activations] [--contigious-checkpointing]
3:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
3:                        [--profile-backward] [--deepspeed]
3:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
3:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
3: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
3:                        [--hidden-size HIDDEN_SIZE]
3:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
3:                        [--num-attention-heads NUM_ATTENTION_HEADS]
3:                        [--kv-channels KV_CHANNELS]
3:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
3:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
3:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
3:                        [--layernorm-epsilon LAYERNORM_EPSILON]
3:                        [--sync-tp-duplicated-parameters]
3:                        [--apply-residual-connection-post-layernorm]
3:                        [--embed-layernorm] [--openai-gelu]
3:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
3:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
3:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
3:                        [--kill-switch-path KILL_SWITCH_PATH]
3:                        [--log-level {debug,info,warning,error,critical}]
3:                        [--log-level-replica {debug,info,warning,error,critical}]
3:                        [--attention-dropout ATTENTION_DROPOUT]
3:                        [--hidden-dropout HIDDEN_DROPOUT]
3:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
3:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
3:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
3:                        [--micro-batch-size MICRO_BATCH_SIZE]
3:                        [--batch-size BATCH_SIZE]
3:                        [--global-batch-size GLOBAL_BATCH_SIZE]
3:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
3:                        [--checkpoint-activations]
3:                        [--distribute-checkpointed-activations]
3:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
3:                        [--train-iters TRAIN_ITERS]
3:                        [--train-samples TRAIN_SAMPLES]
3:                        [--train-tokens TRAIN_TOKENS]
3:                        [--log-interval LOG_INTERVAL]
3:                        [--exit-interval EXIT_INTERVAL]
3:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
3:                        [--tensorboard-dir TENSORBOARD_DIR]
3:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
3:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
3:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
3:                        [--use-bnb-optimizer]
3:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
3:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
3:                        [--eval-only EVAL_ONLY]
3:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
3:                        [--inference]
3:                        [--abort-on-unmet-fused-kernel-constraints]
3:                        [--pp-partition-method PP_PARTITION_METHOD]
3:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
3:                        [--init-method-xavier-uniform] [--lr LR]
3:                        [--lr-decay-style {constant,linear,cosine}]
3:                        [--lr-decay-iters LR_DECAY_ITERS]
3:                        [--lr-decay-samples LR_DECAY_SAMPLES]
3:                        [--lr-decay-tokens LR_DECAY_TOKENS]
3:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
3:                        [--lr-warmup-iters LR_WARMUP_ITERS]
3:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
3:                        [--warmup WARMUP] [--min-lr MIN_LR]
3:                        [--override-lr-scheduler]
3:                        [--use-checkpoint-lr-scheduler]
3:                        [--universal-checkpoint] [--save SAVE]
3:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
3:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
3:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
3:                        [--loss-scale LOSS_SCALE]
3:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
3:                        [--min-loss-scale MIN_LOSS_SCALE]
3:                        [--loss-scale-window LOSS_SCALE_WINDOW]
3:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
3:                        [--no-query-key-layer-scaling]
3:                        [--attention-softmax-in-fp32]
3:                        [--accumulate-allreduce-grads-in-fp32]
3:                        [--fp16-lm-cross-entropy]
3:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
3:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
3:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
3:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
3:                        [--distributed-backend {nccl,gloo}]
3:                        [--DDP-impl {local,torch}]
3:                        [--use-contiguous-buffers-in-ddp]
3:                        [--no-scatter-gather-tensors-in-pipeline]
3:                        [--local_rank LOCAL_RANK]
3:                        [--lazy-mpu-init LAZY_MPU_INIT]
3:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
3:                        [--eval-interval EVAL_INTERVAL]
3:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
3:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
3:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
3:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
3:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
3:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
3:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
3:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
3:                        [--merge-file MERGE_FILE]
3:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
3:                        [--seq-length SEQ_LENGTH]
3:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
3:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
3:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
3:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
3:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
3:                        [--num-workers NUM_WORKERS]
3:                        [--valid-num-workers VALID_NUM_WORKERS]
3:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
3:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
3:                        [--data-impl {lazy,cached,mmap,infer}]
3:                        [--reset-position-ids] [--reset-attention-mask]
3:                        [--eod-mask-loss] [--loss-on-targets-only]
3:                        [--reweight-loss-based-on-position-frequency]
3:                        [--noise-density NOISE_DENSITY]
3:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
3:                        [--adlr-autoresume]
3:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
3:                        [--ict-head-size ICT_HEAD_SIZE]
3:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
3:                        [--biencoder-shared-query-context-model]
3:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
3:                        [--titles-data-path TITLES_DATA_PATH]
3:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
3:                        [--use-one-sent-docs]
3:                        [--evidence-data-path EVIDENCE_DATA_PATH]
3:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
3:                        [--retriever-score-scaling]
3:                        [--block-data-path BLOCK_DATA_PATH]
3:                        [--embedding-path EMBEDDING_PATH]
3:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
3:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
3:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
3:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
3:                        [--log-params-norm] [--log-num-zeros-in-grad]
3:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
3:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
3:                        [--log-timers-to-tensorboard]
3:                        [--log-batch-size-to-tensorboard]
3:                        [--no-log-learnig-rate-to-tensorboard]
3:                        [--no-log-loss-scale-to-tensorboard]
3:                        [--log-validation-ppl-to-tensorboard]
3:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
3:                        [--zero-contigious-gradients]
3:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
3:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
3:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
3:                        [--scattered-embeddings] [--split-transformers]
3:                        [--memory-centric-tiled-linear]
3:                        [--tile-factor TILE_FACTOR]
3:                        [--deepspeed-activation-checkpointing]
3:                        [--partition-activations] [--contigious-checkpointing]
3:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
3:                        [--profile-backward] [--deepspeed]
3:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
3:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
3: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
3: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
3: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
3:                        [--hidden-size HIDDEN_SIZE]
3:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
3:                        [--num-attention-heads NUM_ATTENTION_HEADS]
3:                        [--kv-channels KV_CHANNELS]
3:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
3:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
3:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
3:                        [--layernorm-epsilon LAYERNORM_EPSILON]
3:                        [--sync-tp-duplicated-parameters]
3:                        [--apply-residual-connection-post-layernorm]
3:                        [--embed-layernorm] [--openai-gelu]
3:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
3:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
3:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
3:                        [--kill-switch-path KILL_SWITCH_PATH]
3:                        [--log-level {debug,info,warning,error,critical}]
3:                        [--log-level-replica {debug,info,warning,error,critical}]
3:                        [--attention-dropout ATTENTION_DROPOUT]
3:                        [--hidden-dropout HIDDEN_DROPOUT]
3:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
3:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
3:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
3:                        [--micro-batch-size MICRO_BATCH_SIZE]
3:                        [--batch-size BATCH_SIZE]
3:                        [--global-batch-size GLOBAL_BATCH_SIZE]
3:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
3:                        [--checkpoint-activations]
3:                        [--distribute-checkpointed-activations]
3:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
3:                        [--train-iters TRAIN_ITERS]
3:                        [--train-samples TRAIN_SAMPLES]
3:                        [--train-tokens TRAIN_TOKENS]
3:                        [--log-interval LOG_INTERVAL]
3:                        [--exit-interval EXIT_INTERVAL]
3:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
3:                        [--tensorboard-dir TENSORBOARD_DIR]
3:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
3:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
3:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
3:                        [--use-bnb-optimizer]
3:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
3:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
3:                        [--eval-only EVAL_ONLY]
3:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
3:                        [--inference]
3:                        [--abort-on-unmet-fused-kernel-constraints]
3:                        [--pp-partition-method PP_PARTITION_METHOD]
3:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
3:                        [--init-method-xavier-uniform] [--lr LR]
3:                        [--lr-decay-style {constant,linear,cosine}]
3:                        [--lr-decay-iters LR_DECAY_ITERS]
3:                        [--lr-decay-samples LR_DECAY_SAMPLES]
3:                        [--lr-decay-tokens LR_DECAY_TOKENS]
3:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
3:                        [--lr-warmup-iters LR_WARMUP_ITERS]
3:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
3:                        [--warmup WARMUP] [--min-lr MIN_LR]
3:                        [--override-lr-scheduler]
3:                        [--use-checkpoint-lr-scheduler]
3:                        [--universal-checkpoint] [--save SAVE]
3:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
3:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
3:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
3:                        [--loss-scale LOSS_SCALE]
3:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
3:                        [--min-loss-scale MIN_LOSS_SCALE]
3:                        [--loss-scale-window LOSS_SCALE_WINDOW]
3:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
3:                        [--no-query-key-layer-scaling]
3:                        [--attention-softmax-in-fp32]
3:                        [--accumulate-allreduce-grads-in-fp32]
3:                        [--fp16-lm-cross-entropy]
3:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
3:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
3:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
3:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
3:                        [--distributed-backend {nccl,gloo}]
3:                        [--DDP-impl {local,torch}]
3:                        [--use-contiguous-buffers-in-ddp]
3:                        [--no-scatter-gather-tensors-in-pipeline]
3:                        [--local_rank LOCAL_RANK]
3:                        [--lazy-mpu-init LAZY_MPU_INIT]
3:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
3:                        [--eval-interval EVAL_INTERVAL]
3:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
3:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
3:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
3:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
3:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
3:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
3:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
3:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
3:                        [--merge-file MERGE_FILE]
3:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
3:                        [--seq-length SEQ_LENGTH]
3:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
3:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
3:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
3:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
3:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
3:                        [--num-workers NUM_WORKERS]
3:                        [--valid-num-workers VALID_NUM_WORKERS]
3:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
3:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
3:                        [--data-impl {lazy,cached,mmap,infer}]
3:                        [--reset-position-ids] [--reset-attention-mask]
3:                        [--eod-mask-loss] [--loss-on-targets-only]
3:                        [--reweight-loss-based-on-position-frequency]
3:                        [--noise-density NOISE_DENSITY]
3:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
3:                        [--adlr-autoresume]
3:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
3:                        [--ict-head-size ICT_HEAD_SIZE]
3:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
3:                        [--biencoder-shared-query-context-model]
3:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
3:                        [--titles-data-path TITLES_DATA_PATH]
3:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
3:                        [--use-one-sent-docs]
3:                        [--evidence-data-path EVIDENCE_DATA_PATH]
3:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
3:                        [--retriever-score-scaling]
3:                        [--block-data-path BLOCK_DATA_PATH]
3:                        [--embedding-path EMBEDDING_PATH]
3:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
3:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
3:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
3:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
3:                        [--log-params-norm] [--log-num-zeros-in-grad]
3:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
3:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
3:                        [--log-timers-to-tensorboard]
3:                        [--log-batch-size-to-tensorboard]
3:                        [--no-log-learnig-rate-to-tensorboard]
3:                        [--no-log-loss-scale-to-tensorboard]
3:                        [--log-validation-ppl-to-tensorboard]
3:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
3:                        [--zero-contigious-gradients]
3:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
3:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
3:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
3:                        [--scattered-embeddings] [--split-transformers]
3:                        [--memory-centric-tiled-linear]
3:                        [--tile-factor TILE_FACTOR]
3:                        [--deepspeed-activation-checkpointing]
3:                        [--partition-activations] [--contigious-checkpointing]
3:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
3:                        [--profile-backward] [--deepspeed]
3:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
3:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
3: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
3: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
3:                        [--hidden-size HIDDEN_SIZE]
3:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
3:                        [--num-attention-heads NUM_ATTENTION_HEADS]
3:                        [--kv-channels KV_CHANNELS]
3:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
3:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
3:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
3:                        [--layernorm-epsilon LAYERNORM_EPSILON]
3:                        [--sync-tp-duplicated-parameters]
3:                        [--apply-residual-connection-post-layernorm]
3:                        [--embed-layernorm] [--openai-gelu]
3:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
3:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
3:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
3:                        [--kill-switch-path KILL_SWITCH_PATH]
3:                        [--log-level {debug,info,warning,error,critical}]
3:                        [--log-level-replica {debug,info,warning,error,critical}]
3:                        [--attention-dropout ATTENTION_DROPOUT]
3:                        [--hidden-dropout HIDDEN_DROPOUT]
3:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
3:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
3:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
3:                        [--micro-batch-size MICRO_BATCH_SIZE]
3:                        [--batch-size BATCH_SIZE]
3:                        [--global-batch-size GLOBAL_BATCH_SIZE]
3:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
3:                        [--checkpoint-activations]
3:                        [--distribute-checkpointed-activations]
3:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
3:                        [--train-iters TRAIN_ITERS]
3:                        [--train-samples TRAIN_SAMPLES]
3:                        [--train-tokens TRAIN_TOKENS]
3:                        [--log-interval LOG_INTERVAL]
3:                        [--exit-interval EXIT_INTERVAL]
3:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
3:                        [--tensorboard-dir TENSORBOARD_DIR]
3:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
3:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
3:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
3:                        [--use-bnb-optimizer]
3:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
3:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
3:                        [--eval-only EVAL_ONLY]
3:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
3:                        [--inference]
3:                        [--abort-on-unmet-fused-kernel-constraints]
3:                        [--pp-partition-method PP_PARTITION_METHOD]
3:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
3:                        [--init-method-xavier-uniform] [--lr LR]
3:                        [--lr-decay-style {constant,linear,cosine}]
3:                        [--lr-decay-iters LR_DECAY_ITERS]
3:                        [--lr-decay-samples LR_DECAY_SAMPLES]
3:                        [--lr-decay-tokens LR_DECAY_TOKENS]
3:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
3:                        [--lr-warmup-iters LR_WARMUP_ITERS]
3:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
3:                        [--warmup WARMUP] [--min-lr MIN_LR]
3:                        [--override-lr-scheduler]
3:                        [--use-checkpoint-lr-scheduler]
3:                        [--universal-checkpoint] [--save SAVE]
3:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
3:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
3:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
3:                        [--loss-scale LOSS_SCALE]
3:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
3:                        [--min-loss-scale MIN_LOSS_SCALE]
3:                        [--loss-scale-window LOSS_SCALE_WINDOW]
3:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
3:                        [--no-query-key-layer-scaling]
3:                        [--attention-softmax-in-fp32]
3:                        [--accumulate-allreduce-grads-in-fp32]
3:                        [--fp16-lm-cross-entropy]
3:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
3:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
3:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
3:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
3:                        [--distributed-backend {nccl,gloo}]
3:                        [--DDP-impl {local,torch}]
3:                        [--use-contiguous-buffers-in-ddp]
3:                        [--no-scatter-gather-tensors-in-pipeline]
3:                        [--local_rank LOCAL_RANK]
3:                        [--lazy-mpu-init LAZY_MPU_INIT]
3:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
3:                        [--eval-interval EVAL_INTERVAL]
3:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
3:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
3:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
3:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
3:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
3:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
3:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
3:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
3:                        [--merge-file MERGE_FILE]
3:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
3:                        [--seq-length SEQ_LENGTH]
3:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
3:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
3:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
3:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
3:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
3:                        [--num-workers NUM_WORKERS]
3:                        [--valid-num-workers VALID_NUM_WORKERS]
3:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
3:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
3:                        [--data-impl {lazy,cached,mmap,infer}]
3:                        [--reset-position-ids] [--reset-attention-mask]
3:                        [--eod-mask-loss] [--loss-on-targets-only]
3:                        [--reweight-loss-based-on-position-frequency]
3:                        [--noise-density NOISE_DENSITY]
3:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
3:                        [--adlr-autoresume]
3:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
3:                        [--ict-head-size ICT_HEAD_SIZE]
3:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
3:                        [--biencoder-shared-query-context-model]
3:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
3:                        [--titles-data-path TITLES_DATA_PATH]
3:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
3:                        [--use-one-sent-docs]
3:                        [--evidence-data-path EVIDENCE_DATA_PATH]
3:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
3:                        [--retriever-score-scaling]
3:                        [--block-data-path BLOCK_DATA_PATH]
3:                        [--embedding-path EMBEDDING_PATH]
3:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
3:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
3:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
3:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
3:                        [--log-params-norm] [--log-num-zeros-in-grad]
3:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
3:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
3:                        [--log-timers-to-tensorboard]
3:                        [--log-batch-size-to-tensorboard]
3:                        [--no-log-learnig-rate-to-tensorboard]
3:                        [--no-log-loss-scale-to-tensorboard]
3:                        [--log-validation-ppl-to-tensorboard]
3:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
3:                        [--zero-contigious-gradients]
3:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
3:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
3:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
3:                        [--scattered-embeddings] [--split-transformers]
3:                        [--memory-centric-tiled-linear]
3:                        [--tile-factor TILE_FACTOR]
3:                        [--deepspeed-activation-checkpointing]
3:                        [--partition-activations] [--contigious-checkpointing]
3:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
3:                        [--profile-backward] [--deepspeed]
3:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
3:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
3: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
3: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
3:                        [--hidden-size HIDDEN_SIZE]
3:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
3:                        [--num-attention-heads NUM_ATTENTION_HEADS]
3:                        [--kv-channels KV_CHANNELS]
3:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
3:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
3:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
3:                        [--layernorm-epsilon LAYERNORM_EPSILON]
3:                        [--sync-tp-duplicated-parameters]
3:                        [--apply-residual-connection-post-layernorm]
3:                        [--embed-layernorm] [--openai-gelu]
3:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
3:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
3:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
3:                        [--kill-switch-path KILL_SWITCH_PATH]
3:                        [--log-level {debug,info,warning,error,critical}]
3:                        [--log-level-replica {debug,info,warning,error,critical}]
3:                        [--attention-dropout ATTENTION_DROPOUT]
3:                        [--hidden-dropout HIDDEN_DROPOUT]
3:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
3:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
3:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
3:                        [--micro-batch-size MICRO_BATCH_SIZE]
3:                        [--batch-size BATCH_SIZE]
3:                        [--global-batch-size GLOBAL_BATCH_SIZE]
3:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
3:                        [--checkpoint-activations]
3:                        [--distribute-checkpointed-activations]
3:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
3:                        [--train-iters TRAIN_ITERS]
3:                        [--train-samples TRAIN_SAMPLES]
3:                        [--train-tokens TRAIN_TOKENS]
3:                        [--log-interval LOG_INTERVAL]
3:                        [--exit-interval EXIT_INTERVAL]
3:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
3:                        [--tensorboard-dir TENSORBOARD_DIR]
3:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
3:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
3:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
3:                        [--use-bnb-optimizer]
3:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
3:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
3:                        [--eval-only EVAL_ONLY]
3:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
3:                        [--inference]
3:                        [--abort-on-unmet-fused-kernel-constraints]
3:                        [--pp-partition-method PP_PARTITION_METHOD]
3:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
3:                        [--init-method-xavier-uniform] [--lr LR]
3:                        [--lr-decay-style {constant,linear,cosine}]
3:                        [--lr-decay-iters LR_DECAY_ITERS]
3:                        [--lr-decay-samples LR_DECAY_SAMPLES]
3:                        [--lr-decay-tokens LR_DECAY_TOKENS]
3:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
3:                        [--lr-warmup-iters LR_WARMUP_ITERS]
3:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
3:                        [--warmup WARMUP] [--min-lr MIN_LR]
3:                        [--override-lr-scheduler]
3:                        [--use-checkpoint-lr-scheduler]
3:                        [--universal-checkpoint] [--save SAVE]
3:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
3:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
3:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
3:                        [--loss-scale LOSS_SCALE]
3:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
3:                        [--min-loss-scale MIN_LOSS_SCALE]
3:                        [--loss-scale-window LOSS_SCALE_WINDOW]
3:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
3:                        [--no-query-key-layer-scaling]
3:                        [--attention-softmax-in-fp32]
3:                        [--accumulate-allreduce-grads-in-fp32]
3:                        [--fp16-lm-cross-entropy]
3:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
3:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
3:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
3:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
3:                        [--distributed-backend {nccl,gloo}]
3:                        [--DDP-impl {local,torch}]
3:                        [--use-contiguous-buffers-in-ddp]
3:                        [--no-scatter-gather-tensors-in-pipeline]
3:                        [--local_rank LOCAL_RANK]
3:                        [--lazy-mpu-init LAZY_MPU_INIT]
3:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
3:                        [--eval-interval EVAL_INTERVAL]
3:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
3:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
3:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
3:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
3:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
3:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
3:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
3:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
3:                        [--merge-file MERGE_FILE]
3:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
3:                        [--seq-length SEQ_LENGTH]
3:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
3:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
3:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
3:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
3:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
3:                        [--num-workers NUM_WORKERS]
3:                        [--valid-num-workers VALID_NUM_WORKERS]
3:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
3:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
3:                        [--data-impl {lazy,cached,mmap,infer}]
3:                        [--reset-position-ids] [--reset-attention-mask]
3:                        [--eod-mask-loss] [--loss-on-targets-only]
3:                        [--reweight-loss-based-on-position-frequency]
3:                        [--noise-density NOISE_DENSITY]
3:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
3:                        [--adlr-autoresume]
3:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
3:                        [--ict-head-size ICT_HEAD_SIZE]
3:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
3:                        [--biencoder-shared-query-context-model]
3:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
3:                        [--titles-data-path TITLES_DATA_PATH]
3:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
3:                        [--use-one-sent-docs]
3:                        [--evidence-data-path EVIDENCE_DATA_PATH]
3:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
3:                        [--retriever-score-scaling]
3:                        [--block-data-path BLOCK_DATA_PATH]
3:                        [--embedding-path EMBEDDING_PATH]
3:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
3:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
3:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
3:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
3:                        [--log-params-norm] [--log-num-zeros-in-grad]
3:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
3:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
3:                        [--log-timers-to-tensorboard]
3:                        [--log-batch-size-to-tensorboard]
3:                        [--no-log-learnig-rate-to-tensorboard]
3:                        [--no-log-loss-scale-to-tensorboard]
3:                        [--log-validation-ppl-to-tensorboard]
3:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
3:                        [--zero-contigious-gradients]
3:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
3:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
3:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
3:                        [--scattered-embeddings] [--split-transformers]
3:                        [--memory-centric-tiled-linear]
3:                        [--tile-factor TILE_FACTOR]
3:                        [--deepspeed-activation-checkpointing]
3:                        [--partition-activations] [--contigious-checkpointing]
3:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
3:                        [--profile-backward] [--deepspeed]
3:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
3:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
3: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
3: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
3:                        [--hidden-size HIDDEN_SIZE]
3:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
3:                        [--num-attention-heads NUM_ATTENTION_HEADS]
3:                        [--kv-channels KV_CHANNELS]
3:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
3:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
3:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
3:                        [--layernorm-epsilon LAYERNORM_EPSILON]
3:                        [--sync-tp-duplicated-parameters]
3:                        [--apply-residual-connection-post-layernorm]
3:                        [--embed-layernorm] [--openai-gelu]
3:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
3:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
3:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
3:                        [--kill-switch-path KILL_SWITCH_PATH]
3:                        [--log-level {debug,info,warning,error,critical}]
3:                        [--log-level-replica {debug,info,warning,error,critical}]
3:                        [--attention-dropout ATTENTION_DROPOUT]
3:                        [--hidden-dropout HIDDEN_DROPOUT]
3:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
3:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
3:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
3:                        [--micro-batch-size MICRO_BATCH_SIZE]
3:                        [--batch-size BATCH_SIZE]
3:                        [--global-batch-size GLOBAL_BATCH_SIZE]
3:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
3:                        [--checkpoint-activations]
3:                        [--distribute-checkpointed-activations]
3:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
3:                        [--train-iters TRAIN_ITERS]
3:                        [--train-samples TRAIN_SAMPLES]
3:                        [--train-tokens TRAIN_TOKENS]
3:                        [--log-interval LOG_INTERVAL]
3:                        [--exit-interval EXIT_INTERVAL]
3:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
3:                        [--tensorboard-dir TENSORBOARD_DIR]
3:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
3:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
3:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
3:                        [--use-bnb-optimizer]
3:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
3:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
3:                        [--eval-only EVAL_ONLY]
3:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
3:                        [--inference]
3:                        [--abort-on-unmet-fused-kernel-constraints]
3:                        [--pp-partition-method PP_PARTITION_METHOD]
3:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
3:                        [--init-method-xavier-uniform] [--lr LR]
3:                        [--lr-decay-style {constant,linear,cosine}]
3:                        [--lr-decay-iters LR_DECAY_ITERS]
3:                        [--lr-decay-samples LR_DECAY_SAMPLES]
3:                        [--lr-decay-tokens LR_DECAY_TOKENS]
3:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
3:                        [--lr-warmup-iters LR_WARMUP_ITERS]
3:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
3:                        [--warmup WARMUP] [--min-lr MIN_LR]
3:                        [--override-lr-scheduler]
3:                        [--use-checkpoint-lr-scheduler]
3:                        [--universal-checkpoint] [--save SAVE]
3:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
3:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
3:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
3:                        [--loss-scale LOSS_SCALE]
3:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
3:                        [--min-loss-scale MIN_LOSS_SCALE]
3:                        [--loss-scale-window LOSS_SCALE_WINDOW]
3:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
3:                        [--no-query-key-layer-scaling]
3:                        [--attention-softmax-in-fp32]
3:                        [--accumulate-allreduce-grads-in-fp32]
3:                        [--fp16-lm-cross-entropy]
3:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
3:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
3:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
3:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
3:                        [--distributed-backend {nccl,gloo}]
3:                        [--DDP-impl {local,torch}]
3:                        [--use-contiguous-buffers-in-ddp]
3:                        [--no-scatter-gather-tensors-in-pipeline]
3:                        [--local_rank LOCAL_RANK]
3:                        [--lazy-mpu-init LAZY_MPU_INIT]
3:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
3:                        [--eval-interval EVAL_INTERVAL]
3:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
3:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
3:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
3:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
3:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
3:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
3:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
3:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
3:                        [--merge-file MERGE_FILE]
3:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
3:                        [--seq-length SEQ_LENGTH]
3:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
3:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
3:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
3:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
3:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
3:                        [--num-workers NUM_WORKERS]
3:                        [--valid-num-workers VALID_NUM_WORKERS]
3:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
3:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
3:                        [--data-impl {lazy,cached,mmap,infer}]
3:                        [--reset-position-ids] [--reset-attention-mask]
3:                        [--eod-mask-loss] [--loss-on-targets-only]
3:                        [--reweight-loss-based-on-position-frequency]
3:                        [--noise-density NOISE_DENSITY]
3:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
3:                        [--adlr-autoresume]
3:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
3:                        [--ict-head-size ICT_HEAD_SIZE]
3:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
3:                        [--biencoder-shared-query-context-model]
3:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
3:                        [--titles-data-path TITLES_DATA_PATH]
3:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
3:                        [--use-one-sent-docs]
3:                        [--evidence-data-path EVIDENCE_DATA_PATH]
3:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
3:                        [--retriever-score-scaling]
3:                        [--block-data-path BLOCK_DATA_PATH]
3:                        [--embedding-path EMBEDDING_PATH]
3:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
3:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
3:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
3:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
3:                        [--log-params-norm] [--log-num-zeros-in-grad]
3:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
3:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
3:                        [--log-timers-to-tensorboard]
3:                        [--log-batch-size-to-tensorboard]
3:                        [--no-log-learnig-rate-to-tensorboard]
3:                        [--no-log-loss-scale-to-tensorboard]
3:                        [--log-validation-ppl-to-tensorboard]
3:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
3:                        [--zero-contigious-gradients]
3:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
3:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
3:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
3:                        [--scattered-embeddings] [--split-transformers]
3:                        [--memory-centric-tiled-linear]
3:                        [--tile-factor TILE_FACTOR]
3:                        [--deepspeed-activation-checkpointing]
3:                        [--partition-activations] [--contigious-checkpointing]
3:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
3:                        [--profile-backward] [--deepspeed]
3:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
3:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
3: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
4:                        [--hidden-size HIDDEN_SIZE]
4:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
4:                        [--num-attention-heads NUM_ATTENTION_HEADS]
4:                        [--kv-channels KV_CHANNELS]
4:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
4:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
4:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
4:                        [--layernorm-epsilon LAYERNORM_EPSILON]
4:                        [--sync-tp-duplicated-parameters]
4:                        [--apply-residual-connection-post-layernorm]
4:                        [--embed-layernorm] [--openai-gelu]
4:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
4:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
4:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
4:                        [--kill-switch-path KILL_SWITCH_PATH]
4:                        [--log-level {debug,info,warning,error,critical}]
4:                        [--log-level-replica {debug,info,warning,error,critical}]
4:                        [--attention-dropout ATTENTION_DROPOUT]
4:                        [--hidden-dropout HIDDEN_DROPOUT]
4:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
4:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
4:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
4:                        [--micro-batch-size MICRO_BATCH_SIZE]
4:                        [--batch-size BATCH_SIZE]
4:                        [--global-batch-size GLOBAL_BATCH_SIZE]
4:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
4:                        [--checkpoint-activations]
4:                        [--distribute-checkpointed-activations]
4:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
4:                        [--train-iters TRAIN_ITERS]
4:                        [--train-samples TRAIN_SAMPLES]
4:                        [--train-tokens TRAIN_TOKENS]
4:                        [--log-interval LOG_INTERVAL]
4:                        [--exit-interval EXIT_INTERVAL]
4:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
4:                        [--tensorboard-dir TENSORBOARD_DIR]
4:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
4:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
4:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
4:                        [--use-bnb-optimizer]
4:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
4:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
4:                        [--eval-only EVAL_ONLY]
4:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
4:                        [--inference]
4:                        [--abort-on-unmet-fused-kernel-constraints]
4:                        [--pp-partition-method PP_PARTITION_METHOD]
4:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
4:                        [--init-method-xavier-uniform] [--lr LR]
4:                        [--lr-decay-style {constant,linear,cosine}]
4:                        [--lr-decay-iters LR_DECAY_ITERS]
4:                        [--lr-decay-samples LR_DECAY_SAMPLES]
4:                        [--lr-decay-tokens LR_DECAY_TOKENS]
4:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
4:                        [--lr-warmup-iters LR_WARMUP_ITERS]
4:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
4:                        [--warmup WARMUP] [--min-lr MIN_LR]
4:                        [--override-lr-scheduler]
4:                        [--use-checkpoint-lr-scheduler]
4:                        [--universal-checkpoint] [--save SAVE]
4:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
4:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
4:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
4:                        [--loss-scale LOSS_SCALE]
4:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
4:                        [--min-loss-scale MIN_LOSS_SCALE]
4:                        [--loss-scale-window LOSS_SCALE_WINDOW]
4:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
4:                        [--no-query-key-layer-scaling]
4:                        [--attention-softmax-in-fp32]
4:                        [--accumulate-allreduce-grads-in-fp32]
4:                        [--fp16-lm-cross-entropy]
4:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
4:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
4:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
4:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
4:                        [--distributed-backend {nccl,gloo}]
4:                        [--DDP-impl {local,torch}]
4:                        [--use-contiguous-buffers-in-ddp]
4:                        [--no-scatter-gather-tensors-in-pipeline]
4:                        [--local_rank LOCAL_RANK]
4:                        [--lazy-mpu-init LAZY_MPU_INIT]
4:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
4:                        [--eval-interval EVAL_INTERVAL]
4:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
4:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
4:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
4:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
4:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
4:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
4:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
4:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
4:                        [--merge-file MERGE_FILE]
4:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
4:                        [--seq-length SEQ_LENGTH]
4:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
4:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
4:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
4:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
4:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
4:                        [--num-workers NUM_WORKERS]
4:                        [--valid-num-workers VALID_NUM_WORKERS]
4:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
4:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
4:                        [--data-impl {lazy,cached,mmap,infer}]
4:                        [--reset-position-ids] [--reset-attention-mask]
4:                        [--eod-mask-loss] [--loss-on-targets-only]
4:                        [--reweight-loss-based-on-position-frequency]
4:                        [--noise-density NOISE_DENSITY]
4:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
4:                        [--adlr-autoresume]
4:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
4:                        [--ict-head-size ICT_HEAD_SIZE]
4:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
4:                        [--biencoder-shared-query-context-model]
4:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
4:                        [--titles-data-path TITLES_DATA_PATH]
4:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
4:                        [--use-one-sent-docs]
4:                        [--evidence-data-path EVIDENCE_DATA_PATH]
4:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
4:                        [--retriever-score-scaling]
4:                        [--block-data-path BLOCK_DATA_PATH]
4:                        [--embedding-path EMBEDDING_PATH]
4:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
4:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
4:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
4:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
4:                        [--log-params-norm] [--log-num-zeros-in-grad]
4:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
4:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
4:                        [--log-timers-to-tensorboard]
4:                        [--log-batch-size-to-tensorboard]
4:                        [--no-log-learnig-rate-to-tensorboard]
4:                        [--no-log-loss-scale-to-tensorboard]
4:                        [--log-validation-ppl-to-tensorboard]
4:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
4:                        [--zero-contigious-gradients]
4:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
4:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
4:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
4:                        [--scattered-embeddings] [--split-transformers]
4:                        [--memory-centric-tiled-linear]
4:                        [--tile-factor TILE_FACTOR]
4:                        [--deepspeed-activation-checkpointing]
4:                        [--partition-activations] [--contigious-checkpointing]
4:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
4:                        [--profile-backward] [--deepspeed]
4:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
4:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
4: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
4:                        [--hidden-size HIDDEN_SIZE]
4:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
4:                        [--num-attention-heads NUM_ATTENTION_HEADS]
4:                        [--kv-channels KV_CHANNELS]
4:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
4:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
4:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
4:                        [--layernorm-epsilon LAYERNORM_EPSILON]
4:                        [--sync-tp-duplicated-parameters]
4:                        [--apply-residual-connection-post-layernorm]
4:                        [--embed-layernorm] [--openai-gelu]
4:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
4:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
4:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
4:                        [--kill-switch-path KILL_SWITCH_PATH]
4:                        [--log-level {debug,info,warning,error,critical}]
4:                        [--log-level-replica {debug,info,warning,error,critical}]
4:                        [--attention-dropout ATTENTION_DROPOUT]
4:                        [--hidden-dropout HIDDEN_DROPOUT]
4:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
4:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
4:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
4:                        [--micro-batch-size MICRO_BATCH_SIZE]
4:                        [--batch-size BATCH_SIZE]
4:                        [--global-batch-size GLOBAL_BATCH_SIZE]
4:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
4:                        [--checkpoint-activations]
4:                        [--distribute-checkpointed-activations]
4:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
4:                        [--train-iters TRAIN_ITERS]
4:                        [--train-samples TRAIN_SAMPLES]
4:                        [--train-tokens TRAIN_TOKENS]
4:                        [--log-interval LOG_INTERVAL]
4:                        [--exit-interval EXIT_INTERVAL]
4:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
4:                        [--tensorboard-dir TENSORBOARD_DIR]
4:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
4:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
4:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
4:                        [--use-bnb-optimizer]
4:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
4:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
4:                        [--eval-only EVAL_ONLY]
4:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
4:                        [--inference]
4:                        [--abort-on-unmet-fused-kernel-constraints]
4:                        [--pp-partition-method PP_PARTITION_METHOD]
4:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
4:                        [--init-method-xavier-uniform] [--lr LR]
4:                        [--lr-decay-style {constant,linear,cosine}]
4:                        [--lr-decay-iters LR_DECAY_ITERS]
4:                        [--lr-decay-samples LR_DECAY_SAMPLES]
4:                        [--lr-decay-tokens LR_DECAY_TOKENS]
4:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
4:                        [--lr-warmup-iters LR_WARMUP_ITERS]
4:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
4:                        [--warmup WARMUP] [--min-lr MIN_LR]
4:                        [--override-lr-scheduler]
4:                        [--use-checkpoint-lr-scheduler]
4:                        [--universal-checkpoint] [--save SAVE]
4:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
4:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
4:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
4:                        [--loss-scale LOSS_SCALE]
4:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
4:                        [--min-loss-scale MIN_LOSS_SCALE]
4:                        [--loss-scale-window LOSS_SCALE_WINDOW]
4:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
4:                        [--no-query-key-layer-scaling]
4:                        [--attention-softmax-in-fp32]
4:                        [--accumulate-allreduce-grads-in-fp32]
4:                        [--fp16-lm-cross-entropy]
4:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
4:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
4:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
4:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
4:                        [--distributed-backend {nccl,gloo}]
4:                        [--DDP-impl {local,torch}]
4:                        [--use-contiguous-buffers-in-ddp]
4:                        [--no-scatter-gather-tensors-in-pipeline]
4:                        [--local_rank LOCAL_RANK]
4:                        [--lazy-mpu-init LAZY_MPU_INIT]
4:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
4:                        [--eval-interval EVAL_INTERVAL]
4:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
4:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
4:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
4:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
4:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
4:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
4:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
4:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
4:                        [--merge-file MERGE_FILE]
4:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
4:                        [--seq-length SEQ_LENGTH]
4:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
4:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
4:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
4:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
4:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
4:                        [--num-workers NUM_WORKERS]
4:                        [--valid-num-workers VALID_NUM_WORKERS]
4:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
4:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
4:                        [--data-impl {lazy,cached,mmap,infer}]
4:                        [--reset-position-ids] [--reset-attention-mask]
4:                        [--eod-mask-loss] [--loss-on-targets-only]
4:                        [--reweight-loss-based-on-position-frequency]
4:                        [--noise-density NOISE_DENSITY]
4:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
4:                        [--adlr-autoresume]
4:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
4:                        [--ict-head-size ICT_HEAD_SIZE]
4:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
4:                        [--biencoder-shared-query-context-model]
4:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
4:                        [--titles-data-path TITLES_DATA_PATH]
4:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
4:                        [--use-one-sent-docs]
4:                        [--evidence-data-path EVIDENCE_DATA_PATH]
4:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
4:                        [--retriever-score-scaling]
4:                        [--block-data-path BLOCK_DATA_PATH]
4:                        [--embedding-path EMBEDDING_PATH]
4:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
4:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
4:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
4:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
4:                        [--log-params-norm] [--log-num-zeros-in-grad]
4:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
4:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
4:                        [--log-timers-to-tensorboard]
4:                        [--log-batch-size-to-tensorboard]
4:                        [--no-log-learnig-rate-to-tensorboard]
4:                        [--no-log-loss-scale-to-tensorboard]
4:                        [--log-validation-ppl-to-tensorboard]
4:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
4:                        [--zero-contigious-gradients]
4:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
4:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
4:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
4:                        [--scattered-embeddings] [--split-transformers]
4:                        [--memory-centric-tiled-linear]
4:                        [--tile-factor TILE_FACTOR]
4:                        [--deepspeed-activation-checkpointing]
4:                        [--partition-activations] [--contigious-checkpointing]
4:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
4:                        [--profile-backward] [--deepspeed]
4:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
4:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
4: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
4: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
4: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
4:                        [--hidden-size HIDDEN_SIZE]
4:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
4:                        [--num-attention-heads NUM_ATTENTION_HEADS]
4:                        [--kv-channels KV_CHANNELS]
4:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
4:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
4:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
4:                        [--layernorm-epsilon LAYERNORM_EPSILON]
4:                        [--sync-tp-duplicated-parameters]
4:                        [--apply-residual-connection-post-layernorm]
4:                        [--embed-layernorm] [--openai-gelu]
4:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
4:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
4:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
4:                        [--kill-switch-path KILL_SWITCH_PATH]
4:                        [--log-level {debug,info,warning,error,critical}]
4:                        [--log-level-replica {debug,info,warning,error,critical}]
4:                        [--attention-dropout ATTENTION_DROPOUT]
4:                        [--hidden-dropout HIDDEN_DROPOUT]
4:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
4:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
4:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
4:                        [--micro-batch-size MICRO_BATCH_SIZE]
4:                        [--batch-size BATCH_SIZE]
4:                        [--global-batch-size GLOBAL_BATCH_SIZE]
4:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
4:                        [--checkpoint-activations]
4:                        [--distribute-checkpointed-activations]
4:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
4:                        [--train-iters TRAIN_ITERS]
4:                        [--train-samples TRAIN_SAMPLES]
4:                        [--train-tokens TRAIN_TOKENS]
4:                        [--log-interval LOG_INTERVAL]
4:                        [--exit-interval EXIT_INTERVAL]
4:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
4:                        [--tensorboard-dir TENSORBOARD_DIR]
4:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
4:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
4:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
4:                        [--use-bnb-optimizer]
4:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
4:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
4:                        [--eval-only EVAL_ONLY]
4:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
4:                        [--inference]
4:                        [--abort-on-unmet-fused-kernel-constraints]
4:                        [--pp-partition-method PP_PARTITION_METHOD]
4:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
4:                        [--init-method-xavier-uniform] [--lr LR]
4:                        [--lr-decay-style {constant,linear,cosine}]
4:                        [--lr-decay-iters LR_DECAY_ITERS]
4:                        [--lr-decay-samples LR_DECAY_SAMPLES]
4:                        [--lr-decay-tokens LR_DECAY_TOKENS]
4:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
4:                        [--lr-warmup-iters LR_WARMUP_ITERS]
4:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
4:                        [--warmup WARMUP] [--min-lr MIN_LR]
4:                        [--override-lr-scheduler]
4:                        [--use-checkpoint-lr-scheduler]
4:                        [--universal-checkpoint] [--save SAVE]
4:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
4:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
4:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
4:                        [--loss-scale LOSS_SCALE]
4:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
4:                        [--min-loss-scale MIN_LOSS_SCALE]
4:                        [--loss-scale-window LOSS_SCALE_WINDOW]
4:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
4:                        [--no-query-key-layer-scaling]
4:                        [--attention-softmax-in-fp32]
4:                        [--accumulate-allreduce-grads-in-fp32]
4:                        [--fp16-lm-cross-entropy]
4:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
4:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
4:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
4:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
4:                        [--distributed-backend {nccl,gloo}]
4:                        [--DDP-impl {local,torch}]
4:                        [--use-contiguous-buffers-in-ddp]
4:                        [--no-scatter-gather-tensors-in-pipeline]
4:                        [--local_rank LOCAL_RANK]
4:                        [--lazy-mpu-init LAZY_MPU_INIT]
4:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
4:                        [--eval-interval EVAL_INTERVAL]
4:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
4:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
4:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
4:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
4:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
4:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
4:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
4:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
4:                        [--merge-file MERGE_FILE]
4:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
4:                        [--seq-length SEQ_LENGTH]
4:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
4:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
4:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
4:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
4:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
4:                        [--num-workers NUM_WORKERS]
4:                        [--valid-num-workers VALID_NUM_WORKERS]
4:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
4:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
4:                        [--data-impl {lazy,cached,mmap,infer}]
4:                        [--reset-position-ids] [--reset-attention-mask]
4:                        [--eod-mask-loss] [--loss-on-targets-only]
4:                        [--reweight-loss-based-on-position-frequency]
4:                        [--noise-density NOISE_DENSITY]
4:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
4:                        [--adlr-autoresume]
4:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
4:                        [--ict-head-size ICT_HEAD_SIZE]
4:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
4:                        [--biencoder-shared-query-context-model]
4:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
4:                        [--titles-data-path TITLES_DATA_PATH]
4:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
4:                        [--use-one-sent-docs]
4:                        [--evidence-data-path EVIDENCE_DATA_PATH]
4:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
4:                        [--retriever-score-scaling]
4:                        [--block-data-path BLOCK_DATA_PATH]
4:                        [--embedding-path EMBEDDING_PATH]
4:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
4:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
4:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
4:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
4:                        [--log-params-norm] [--log-num-zeros-in-grad]
4:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
4:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
4:                        [--log-timers-to-tensorboard]
4:                        [--log-batch-size-to-tensorboard]
4:                        [--no-log-learnig-rate-to-tensorboard]
4:                        [--no-log-loss-scale-to-tensorboard]
4:                        [--log-validation-ppl-to-tensorboard]
4:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
4:                        [--zero-contigious-gradients]
4:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
4:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
4:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
4:                        [--scattered-embeddings] [--split-transformers]
4:                        [--memory-centric-tiled-linear]
4:                        [--tile-factor TILE_FACTOR]
4:                        [--deepspeed-activation-checkpointing]
4:                        [--partition-activations] [--contigious-checkpointing]
4:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
4:                        [--profile-backward] [--deepspeed]
4:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
4:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
4: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
4: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
4:                        [--hidden-size HIDDEN_SIZE]
4:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
4:                        [--num-attention-heads NUM_ATTENTION_HEADS]
4:                        [--kv-channels KV_CHANNELS]
4:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
4:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
4:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
4:                        [--layernorm-epsilon LAYERNORM_EPSILON]
4:                        [--sync-tp-duplicated-parameters]
4:                        [--apply-residual-connection-post-layernorm]
4:                        [--embed-layernorm] [--openai-gelu]
4:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
4:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
4:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
4:                        [--kill-switch-path KILL_SWITCH_PATH]
4:                        [--log-level {debug,info,warning,error,critical}]
4:                        [--log-level-replica {debug,info,warning,error,critical}]
4:                        [--attention-dropout ATTENTION_DROPOUT]
4:                        [--hidden-dropout HIDDEN_DROPOUT]
4:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
4:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
4:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
4:                        [--micro-batch-size MICRO_BATCH_SIZE]
4:                        [--batch-size BATCH_SIZE]
4:                        [--global-batch-size GLOBAL_BATCH_SIZE]
4:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
4:                        [--checkpoint-activations]
4:                        [--distribute-checkpointed-activations]
4:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
4:                        [--train-iters TRAIN_ITERS]
4:                        [--train-samples TRAIN_SAMPLES]
4:                        [--train-tokens TRAIN_TOKENS]
4:                        [--log-interval LOG_INTERVAL]
4:                        [--exit-interval EXIT_INTERVAL]
4:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
4:                        [--tensorboard-dir TENSORBOARD_DIR]
4:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
4:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
4:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
4:                        [--use-bnb-optimizer]
4:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
4:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
4:                        [--eval-only EVAL_ONLY]
4:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
4:                        [--inference]
4:                        [--abort-on-unmet-fused-kernel-constraints]
4:                        [--pp-partition-method PP_PARTITION_METHOD]
4:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
4:                        [--init-method-xavier-uniform] [--lr LR]
4:                        [--lr-decay-style {constant,linear,cosine}]
4:                        [--lr-decay-iters LR_DECAY_ITERS]
4:                        [--lr-decay-samples LR_DECAY_SAMPLES]
4:                        [--lr-decay-tokens LR_DECAY_TOKENS]
4:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
4:                        [--lr-warmup-iters LR_WARMUP_ITERS]
4:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
4:                        [--warmup WARMUP] [--min-lr MIN_LR]
4:                        [--override-lr-scheduler]
4:                        [--use-checkpoint-lr-scheduler]
4:                        [--universal-checkpoint] [--save SAVE]
4:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
4:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
4:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
4:                        [--loss-scale LOSS_SCALE]
4:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
4:                        [--min-loss-scale MIN_LOSS_SCALE]
4:                        [--loss-scale-window LOSS_SCALE_WINDOW]
4:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
4:                        [--no-query-key-layer-scaling]
4:                        [--attention-softmax-in-fp32]
4:                        [--accumulate-allreduce-grads-in-fp32]
4:                        [--fp16-lm-cross-entropy]
4:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
4:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
4:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
4:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
4:                        [--distributed-backend {nccl,gloo}]
4:                        [--DDP-impl {local,torch}]
4:                        [--use-contiguous-buffers-in-ddp]
4:                        [--no-scatter-gather-tensors-in-pipeline]
4:                        [--local_rank LOCAL_RANK]
4:                        [--lazy-mpu-init LAZY_MPU_INIT]
4:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
4:                        [--eval-interval EVAL_INTERVAL]
4:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
4:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
4:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
4:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
4:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
4:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
4:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
4:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
4:                        [--merge-file MERGE_FILE]
4:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
4:                        [--seq-length SEQ_LENGTH]
4:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
4:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
4:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
4:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
4:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
4:                        [--num-workers NUM_WORKERS]
4:                        [--valid-num-workers VALID_NUM_WORKERS]
4:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
4:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
4:                        [--data-impl {lazy,cached,mmap,infer}]
4:                        [--reset-position-ids] [--reset-attention-mask]
4:                        [--eod-mask-loss] [--loss-on-targets-only]
4:                        [--reweight-loss-based-on-position-frequency]
4:                        [--noise-density NOISE_DENSITY]
4:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
4:                        [--adlr-autoresume]
4:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
4:                        [--ict-head-size ICT_HEAD_SIZE]
4:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
4:                        [--biencoder-shared-query-context-model]
4:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
4:                        [--titles-data-path TITLES_DATA_PATH]
4:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
4:                        [--use-one-sent-docs]
4:                        [--evidence-data-path EVIDENCE_DATA_PATH]
4:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
4:                        [--retriever-score-scaling]
4:                        [--block-data-path BLOCK_DATA_PATH]
4:                        [--embedding-path EMBEDDING_PATH]
4:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
4:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
4:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
4:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
4:                        [--log-params-norm] [--log-num-zeros-in-grad]
4:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
4:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
4:                        [--log-timers-to-tensorboard]
4:                        [--log-batch-size-to-tensorboard]
4:                        [--no-log-learnig-rate-to-tensorboard]
4:                        [--no-log-loss-scale-to-tensorboard]
4:                        [--log-validation-ppl-to-tensorboard]
4:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
4:                        [--zero-contigious-gradients]
4:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
4:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
4:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
4:                        [--scattered-embeddings] [--split-transformers]
4:                        [--memory-centric-tiled-linear]
4:                        [--tile-factor TILE_FACTOR]
4:                        [--deepspeed-activation-checkpointing]
4:                        [--partition-activations] [--contigious-checkpointing]
4:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
4:                        [--profile-backward] [--deepspeed]
4:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
4:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
4: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
4: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
4:                        [--hidden-size HIDDEN_SIZE]
4:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
4:                        [--num-attention-heads NUM_ATTENTION_HEADS]
4:                        [--kv-channels KV_CHANNELS]
4:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
4:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
4:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
4:                        [--layernorm-epsilon LAYERNORM_EPSILON]
4:                        [--sync-tp-duplicated-parameters]
4:                        [--apply-residual-connection-post-layernorm]
4:                        [--embed-layernorm] [--openai-gelu]
4:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
4:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
4:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
4:                        [--kill-switch-path KILL_SWITCH_PATH]
4:                        [--log-level {debug,info,warning,error,critical}]
4:                        [--log-level-replica {debug,info,warning,error,critical}]
4:                        [--attention-dropout ATTENTION_DROPOUT]
4:                        [--hidden-dropout HIDDEN_DROPOUT]
4:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
4:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
4:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
4:                        [--micro-batch-size MICRO_BATCH_SIZE]
4:                        [--batch-size BATCH_SIZE]
4:                        [--global-batch-size GLOBAL_BATCH_SIZE]
4:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
4:                        [--checkpoint-activations]
4:                        [--distribute-checkpointed-activations]
4:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
4:                        [--train-iters TRAIN_ITERS]
4:                        [--train-samples TRAIN_SAMPLES]
4:                        [--train-tokens TRAIN_TOKENS]
4:                        [--log-interval LOG_INTERVAL]
4:                        [--exit-interval EXIT_INTERVAL]
4:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
4:                        [--tensorboard-dir TENSORBOARD_DIR]
4:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
4:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
4:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
4:                        [--use-bnb-optimizer]
4:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
4:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
4:                        [--eval-only EVAL_ONLY]
4:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
4:                        [--inference]
4:                        [--abort-on-unmet-fused-kernel-constraints]
4:                        [--pp-partition-method PP_PARTITION_METHOD]
4:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
4:                        [--init-method-xavier-uniform] [--lr LR]
4:                        [--lr-decay-style {constant,linear,cosine}]
4:                        [--lr-decay-iters LR_DECAY_ITERS]
4:                        [--lr-decay-samples LR_DECAY_SAMPLES]
4:                        [--lr-decay-tokens LR_DECAY_TOKENS]
4:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
4:                        [--lr-warmup-iters LR_WARMUP_ITERS]
4:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
4:                        [--warmup WARMUP] [--min-lr MIN_LR]
4:                        [--override-lr-scheduler]
4:                        [--use-checkpoint-lr-scheduler]
4:                        [--universal-checkpoint] [--save SAVE]
4:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
4:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
4:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
4:                        [--loss-scale LOSS_SCALE]
4:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
4:                        [--min-loss-scale MIN_LOSS_SCALE]
4:                        [--loss-scale-window LOSS_SCALE_WINDOW]
4:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
4:                        [--no-query-key-layer-scaling]
4:                        [--attention-softmax-in-fp32]
4:                        [--accumulate-allreduce-grads-in-fp32]
4:                        [--fp16-lm-cross-entropy]
4:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
4:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
4:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
4:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
4:                        [--distributed-backend {nccl,gloo}]
4:                        [--DDP-impl {local,torch}]
4:                        [--use-contiguous-buffers-in-ddp]
4:                        [--no-scatter-gather-tensors-in-pipeline]
4:                        [--local_rank LOCAL_RANK]
4:                        [--lazy-mpu-init LAZY_MPU_INIT]
4:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
4:                        [--eval-interval EVAL_INTERVAL]
4:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
4:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
4:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
4:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
4:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
4:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
4:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
4:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
4:                        [--merge-file MERGE_FILE]
4:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
4:                        [--seq-length SEQ_LENGTH]
4:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
4:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
4:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
4:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
4:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
4:                        [--num-workers NUM_WORKERS]
4:                        [--valid-num-workers VALID_NUM_WORKERS]
4:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
4:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
4:                        [--data-impl {lazy,cached,mmap,infer}]
4:                        [--reset-position-ids] [--reset-attention-mask]
4:                        [--eod-mask-loss] [--loss-on-targets-only]
4:                        [--reweight-loss-based-on-position-frequency]
4:                        [--noise-density NOISE_DENSITY]
4:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
4:                        [--adlr-autoresume]
4:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
4:                        [--ict-head-size ICT_HEAD_SIZE]
4:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
4:                        [--biencoder-shared-query-context-model]
4:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
4:                        [--titles-data-path TITLES_DATA_PATH]
4:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
4:                        [--use-one-sent-docs]
4:                        [--evidence-data-path EVIDENCE_DATA_PATH]
4:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
4:                        [--retriever-score-scaling]
4:                        [--block-data-path BLOCK_DATA_PATH]
4:                        [--embedding-path EMBEDDING_PATH]
4:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
4:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
4:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
4:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
4:                        [--log-params-norm] [--log-num-zeros-in-grad]
4:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
4:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
4:                        [--log-timers-to-tensorboard]
4:                        [--log-batch-size-to-tensorboard]
4:                        [--no-log-learnig-rate-to-tensorboard]
4:                        [--no-log-loss-scale-to-tensorboard]
4:                        [--log-validation-ppl-to-tensorboard]
4:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
4:                        [--zero-contigious-gradients]
4:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
4:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
4:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
4:                        [--scattered-embeddings] [--split-transformers]
4:                        [--memory-centric-tiled-linear]
4:                        [--tile-factor TILE_FACTOR]
4:                        [--deepspeed-activation-checkpointing]
4:                        [--partition-activations] [--contigious-checkpointing]
4:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
4:                        [--profile-backward] [--deepspeed]
4:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
4:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
4: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
4: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
4:                        [--hidden-size HIDDEN_SIZE]
4:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
4:                        [--num-attention-heads NUM_ATTENTION_HEADS]
4:                        [--kv-channels KV_CHANNELS]
4:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
4:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
4:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
4:                        [--layernorm-epsilon LAYERNORM_EPSILON]
4:                        [--sync-tp-duplicated-parameters]
4:                        [--apply-residual-connection-post-layernorm]
4:                        [--embed-layernorm] [--openai-gelu]
4:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
4:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
4:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
4:                        [--kill-switch-path KILL_SWITCH_PATH]
4:                        [--log-level {debug,info,warning,error,critical}]
4:                        [--log-level-replica {debug,info,warning,error,critical}]
4:                        [--attention-dropout ATTENTION_DROPOUT]
4:                        [--hidden-dropout HIDDEN_DROPOUT]
4:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
4:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
4:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
4:                        [--micro-batch-size MICRO_BATCH_SIZE]
4:                        [--batch-size BATCH_SIZE]
4:                        [--global-batch-size GLOBAL_BATCH_SIZE]
4:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
4:                        [--checkpoint-activations]
4:                        [--distribute-checkpointed-activations]
4:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
4:                        [--train-iters TRAIN_ITERS]
4:                        [--train-samples TRAIN_SAMPLES]
4:                        [--train-tokens TRAIN_TOKENS]
4:                        [--log-interval LOG_INTERVAL]
4:                        [--exit-interval EXIT_INTERVAL]
4:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
4:                        [--tensorboard-dir TENSORBOARD_DIR]
4:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
4:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
4:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
4:                        [--use-bnb-optimizer]
4:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
4:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
4:                        [--eval-only EVAL_ONLY]
4:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
4:                        [--inference]
4:                        [--abort-on-unmet-fused-kernel-constraints]
4:                        [--pp-partition-method PP_PARTITION_METHOD]
4:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
4:                        [--init-method-xavier-uniform] [--lr LR]
4:                        [--lr-decay-style {constant,linear,cosine}]
4:                        [--lr-decay-iters LR_DECAY_ITERS]
4:                        [--lr-decay-samples LR_DECAY_SAMPLES]
4:                        [--lr-decay-tokens LR_DECAY_TOKENS]
4:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
4:                        [--lr-warmup-iters LR_WARMUP_ITERS]
4:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
4:                        [--warmup WARMUP] [--min-lr MIN_LR]
4:                        [--override-lr-scheduler]
4:                        [--use-checkpoint-lr-scheduler]
4:                        [--universal-checkpoint] [--save SAVE]
4:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
4:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
4:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
4:                        [--loss-scale LOSS_SCALE]
4:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
4:                        [--min-loss-scale MIN_LOSS_SCALE]
4:                        [--loss-scale-window LOSS_SCALE_WINDOW]
4:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
4:                        [--no-query-key-layer-scaling]
4:                        [--attention-softmax-in-fp32]
4:                        [--accumulate-allreduce-grads-in-fp32]
4:                        [--fp16-lm-cross-entropy]
4:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
4:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
4:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
4:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
4:                        [--distributed-backend {nccl,gloo}]
4:                        [--DDP-impl {local,torch}]
4:                        [--use-contiguous-buffers-in-ddp]
4:                        [--no-scatter-gather-tensors-in-pipeline]
4:                        [--local_rank LOCAL_RANK]
4:                        [--lazy-mpu-init LAZY_MPU_INIT]
4:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
4:                        [--eval-interval EVAL_INTERVAL]
4:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
4:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
4:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
4:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
4:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
4:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
4:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
4:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
4:                        [--merge-file MERGE_FILE]
4:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
4:                        [--seq-length SEQ_LENGTH]
4:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
4:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
4:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
4:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
4:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
4:                        [--num-workers NUM_WORKERS]
4:                        [--valid-num-workers VALID_NUM_WORKERS]
4:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
4:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
4:                        [--data-impl {lazy,cached,mmap,infer}]
4:                        [--reset-position-ids] [--reset-attention-mask]
4:                        [--eod-mask-loss] [--loss-on-targets-only]
4:                        [--reweight-loss-based-on-position-frequency]
4:                        [--noise-density NOISE_DENSITY]
4:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
4:                        [--adlr-autoresume]
4:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
4:                        [--ict-head-size ICT_HEAD_SIZE]
4:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
4:                        [--biencoder-shared-query-context-model]
4:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
4:                        [--titles-data-path TITLES_DATA_PATH]
4:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
4:                        [--use-one-sent-docs]
4:                        [--evidence-data-path EVIDENCE_DATA_PATH]
4:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
4:                        [--retriever-score-scaling]
4:                        [--block-data-path BLOCK_DATA_PATH]
4:                        [--embedding-path EMBEDDING_PATH]
4:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
4:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
4:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
4:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
4:                        [--log-params-norm] [--log-num-zeros-in-grad]
4:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
4:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
4:                        [--log-timers-to-tensorboard]
4:                        [--log-batch-size-to-tensorboard]
4:                        [--no-log-learnig-rate-to-tensorboard]
4:                        [--no-log-loss-scale-to-tensorboard]
4:                        [--log-validation-ppl-to-tensorboard]
4:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
4:                        [--zero-contigious-gradients]
4:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
4:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
4:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
4:                        [--scattered-embeddings] [--split-transformers]
4:                        [--memory-centric-tiled-linear]
4:                        [--tile-factor TILE_FACTOR]
4:                        [--deepspeed-activation-checkpointing]
4:                        [--partition-activations] [--contigious-checkpointing]
4:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
4:                        [--profile-backward] [--deepspeed]
4:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
4:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
4: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
4: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
4:                        [--hidden-size HIDDEN_SIZE]
4:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
4:                        [--num-attention-heads NUM_ATTENTION_HEADS]
4:                        [--kv-channels KV_CHANNELS]
4:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
4:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
4:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
4:                        [--layernorm-epsilon LAYERNORM_EPSILON]
4:                        [--sync-tp-duplicated-parameters]
4:                        [--apply-residual-connection-post-layernorm]
4:                        [--embed-layernorm] [--openai-gelu]
4:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
4:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
4:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
4:                        [--kill-switch-path KILL_SWITCH_PATH]
4:                        [--log-level {debug,info,warning,error,critical}]
4:                        [--log-level-replica {debug,info,warning,error,critical}]
4:                        [--attention-dropout ATTENTION_DROPOUT]
4:                        [--hidden-dropout HIDDEN_DROPOUT]
4:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
4:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
4:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
4:                        [--micro-batch-size MICRO_BATCH_SIZE]
4:                        [--batch-size BATCH_SIZE]
4:                        [--global-batch-size GLOBAL_BATCH_SIZE]
4:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
4:                        [--checkpoint-activations]
4:                        [--distribute-checkpointed-activations]
4:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
4:                        [--train-iters TRAIN_ITERS]
4:                        [--train-samples TRAIN_SAMPLES]
4:                        [--train-tokens TRAIN_TOKENS]
4:                        [--log-interval LOG_INTERVAL]
4:                        [--exit-interval EXIT_INTERVAL]
4:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
4:                        [--tensorboard-dir TENSORBOARD_DIR]
4:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
4:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
4:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
4:                        [--use-bnb-optimizer]
4:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
4:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
4:                        [--eval-only EVAL_ONLY]
4:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
4:                        [--inference]
4:                        [--abort-on-unmet-fused-kernel-constraints]
4:                        [--pp-partition-method PP_PARTITION_METHOD]
4:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
4:                        [--init-method-xavier-uniform] [--lr LR]
4:                        [--lr-decay-style {constant,linear,cosine}]
4:                        [--lr-decay-iters LR_DECAY_ITERS]
4:                        [--lr-decay-samples LR_DECAY_SAMPLES]
4:                        [--lr-decay-tokens LR_DECAY_TOKENS]
4:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
4:                        [--lr-warmup-iters LR_WARMUP_ITERS]
4:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
4:                        [--warmup WARMUP] [--min-lr MIN_LR]
4:                        [--override-lr-scheduler]
4:                        [--use-checkpoint-lr-scheduler]
4:                        [--universal-checkpoint] [--save SAVE]
4:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
4:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
4:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
4:                        [--loss-scale LOSS_SCALE]
4:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
4:                        [--min-loss-scale MIN_LOSS_SCALE]
4:                        [--loss-scale-window LOSS_SCALE_WINDOW]
4:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
4:                        [--no-query-key-layer-scaling]
4:                        [--attention-softmax-in-fp32]
4:                        [--accumulate-allreduce-grads-in-fp32]
4:                        [--fp16-lm-cross-entropy]
4:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
4:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
4:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
4:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
4:                        [--distributed-backend {nccl,gloo}]
4:                        [--DDP-impl {local,torch}]
4:                        [--use-contiguous-buffers-in-ddp]
4:                        [--no-scatter-gather-tensors-in-pipeline]
4:                        [--local_rank LOCAL_RANK]
4:                        [--lazy-mpu-init LAZY_MPU_INIT]
4:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
4:                        [--eval-interval EVAL_INTERVAL]
4:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
4:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
4:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
4:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
4:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
4:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
4:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
4:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
4:                        [--merge-file MERGE_FILE]
4:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
4:                        [--seq-length SEQ_LENGTH]
4:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
4:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
4:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
4:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
4:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
4:                        [--num-workers NUM_WORKERS]
4:                        [--valid-num-workers VALID_NUM_WORKERS]
4:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
4:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
4:                        [--data-impl {lazy,cached,mmap,infer}]
4:                        [--reset-position-ids] [--reset-attention-mask]
4:                        [--eod-mask-loss] [--loss-on-targets-only]
4:                        [--reweight-loss-based-on-position-frequency]
4:                        [--noise-density NOISE_DENSITY]
4:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
4:                        [--adlr-autoresume]
4:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
4:                        [--ict-head-size ICT_HEAD_SIZE]
4:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
4:                        [--biencoder-shared-query-context-model]
4:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
4:                        [--titles-data-path TITLES_DATA_PATH]
4:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
4:                        [--use-one-sent-docs]
4:                        [--evidence-data-path EVIDENCE_DATA_PATH]
4:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
4:                        [--retriever-score-scaling]
4:                        [--block-data-path BLOCK_DATA_PATH]
4:                        [--embedding-path EMBEDDING_PATH]
4:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
4:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
4:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
4:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
4:                        [--log-params-norm] [--log-num-zeros-in-grad]
4:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
4:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
4:                        [--log-timers-to-tensorboard]
4:                        [--log-batch-size-to-tensorboard]
4:                        [--no-log-learnig-rate-to-tensorboard]
4:                        [--no-log-loss-scale-to-tensorboard]
4:                        [--log-validation-ppl-to-tensorboard]
4:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
4:                        [--zero-contigious-gradients]
4:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
4:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
4:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
4:                        [--scattered-embeddings] [--split-transformers]
4:                        [--memory-centric-tiled-linear]
4:                        [--tile-factor TILE_FACTOR]
4:                        [--deepspeed-activation-checkpointing]
4:                        [--partition-activations] [--contigious-checkpointing]
4:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
4:                        [--profile-backward] [--deepspeed]
4:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
4:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
4: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
4: usage: pretrain_gpt.py [-h] [--num-layers NUM_LAYERS]
4:                        [--hidden-size HIDDEN_SIZE]
4:                        [--ffn-hidden-size FFN_HIDDEN_SIZE]
4:                        [--num-attention-heads NUM_ATTENTION_HEADS]
4:                        [--kv-channels KV_CHANNELS]
4:                        [--max-position-embeddings MAX_POSITION_EMBEDDINGS]
4:                        [--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY]
4:                        [--pad-vocab-size-to PAD_VOCAB_SIZE_TO]
4:                        [--layernorm-epsilon LAYERNORM_EPSILON]
4:                        [--sync-tp-duplicated-parameters]
4:                        [--apply-residual-connection-post-layernorm]
4:                        [--embed-layernorm] [--openai-gelu]
4:                        [--onnx-safe ONNX_SAFE] [--bert-no-binary-head]
4:                        [--position-embedding-type {PositionEmbeddingType.rotary,PositionEmbeddingType.absolute,PositionEmbeddingType.alibi}]
4:                        [--glu-activation {geglu,liglu,reglu,swiglu}]
4:                        [--kill-switch-path KILL_SWITCH_PATH]
4:                        [--log-level {debug,info,warning,error,critical}]
4:                        [--log-level-replica {debug,info,warning,error,critical}]
4:                        [--attention-dropout ATTENTION_DROPOUT]
4:                        [--hidden-dropout HIDDEN_DROPOUT]
4:                        [--weight-decay WEIGHT_DECAY] [--clip-grad CLIP_GRAD]
4:                        [--adam-beta1 ADAM_BETA1] [--adam-beta2 ADAM_BETA2]
4:                        [--adam-eps ADAM_EPS] [--sgd-momentum SGD_MOMENTUM]
4:                        [--micro-batch-size MICRO_BATCH_SIZE]
4:                        [--batch-size BATCH_SIZE]
4:                        [--global-batch-size GLOBAL_BATCH_SIZE]
4:                        [--rampup-batch-size [RAMPUP_BATCH_SIZE ...]]
4:                        [--checkpoint-activations]
4:                        [--distribute-checkpointed-activations]
4:                        [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS]
4:                        [--train-iters TRAIN_ITERS]
4:                        [--train-samples TRAIN_SAMPLES]
4:                        [--train-tokens TRAIN_TOKENS]
4:                        [--log-interval LOG_INTERVAL]
4:                        [--exit-interval EXIT_INTERVAL]
4:                        [--exit-duration-in-mins EXIT_DURATION_IN_MINS]
4:                        [--tensorboard-dir TENSORBOARD_DIR]
4:                        [--no-masked-softmax-fusion] [--no-bias-gelu-fusion]
4:                        [--no-bias-dropout-fusion] [--no-layer-norm-fusion]
4:                        [--no-optimizer-fusion] [--optimizer {adam,sgd}]
4:                        [--use-bnb-optimizer]
4:                        [--dataloader-type {single,cyclic}] [--cpu-optimizer]
4:                        [--cpu_torch_adam] [--codecarbon-dir CODECARBON_DIR]
4:                        [--eval-only EVAL_ONLY]
4:                        [--skip-train-iteration-range SKIP_TRAIN_ITERATION_RANGE [SKIP_TRAIN_ITERATION_RANGE ...]]
4:                        [--inference]
4:                        [--abort-on-unmet-fused-kernel-constraints]
4:                        [--pp-partition-method PP_PARTITION_METHOD]
4:                        [--seed SEED] [--init-method-std INIT_METHOD_STD]
4:                        [--init-method-xavier-uniform] [--lr LR]
4:                        [--lr-decay-style {constant,linear,cosine}]
4:                        [--lr-decay-iters LR_DECAY_ITERS]
4:                        [--lr-decay-samples LR_DECAY_SAMPLES]
4:                        [--lr-decay-tokens LR_DECAY_TOKENS]
4:                        [--lr-warmup-fraction LR_WARMUP_FRACTION]
4:                        [--lr-warmup-iters LR_WARMUP_ITERS]
4:                        [--lr-warmup-samples LR_WARMUP_SAMPLES]
4:                        [--warmup WARMUP] [--min-lr MIN_LR]
4:                        [--override-lr-scheduler]
4:                        [--use-checkpoint-lr-scheduler]
4:                        [--universal-checkpoint] [--save SAVE]
4:                        [--save-interval SAVE_INTERVAL] [--no-save-optim]
4:                        [--no-save-rng] [--load LOAD] [--no-load-optim]
4:                        [--no-load-rng] [--finetune] [--fp16] [--bf16]
4:                        [--loss-scale LOSS_SCALE]
4:                        [--initial-loss-scale INITIAL_LOSS_SCALE]
4:                        [--min-loss-scale MIN_LOSS_SCALE]
4:                        [--loss-scale-window LOSS_SCALE_WINDOW]
4:                        [--hysteresis HYSTERESIS] [--fp32-residual-connection]
4:                        [--no-query-key-layer-scaling]
4:                        [--attention-softmax-in-fp32]
4:                        [--accumulate-allreduce-grads-in-fp32]
4:                        [--fp16-lm-cross-entropy]
4:                        [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
4:                        [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
4:                        [--model-parallel-size MODEL_PARALLEL_SIZE]
4:                        [--num-layers-per-virtual-pipeline-stage NUM_LAYERS_PER_VIRTUAL_PIPELINE_STAGE]
4:                        [--distributed-backend {nccl,gloo}]
4:                        [--DDP-impl {local,torch}]
4:                        [--use-contiguous-buffers-in-ddp]
4:                        [--no-scatter-gather-tensors-in-pipeline]
4:                        [--local_rank LOCAL_RANK]
4:                        [--lazy-mpu-init LAZY_MPU_INIT]
4:                        [--use-cpu-initialization] [--eval-iters EVAL_ITERS]
4:                        [--eval-interval EVAL_INTERVAL]
4:                        [--data-path [DATA_PATH ...]] [--split SPLIT]
4:                        [--train-weighted-split-paths [TRAIN_WEIGHTED_SPLIT_PATHS ...]]
4:                        [--valid-weighted-split-paths [VALID_WEIGHTED_SPLIT_PATHS ...]]
4:                        [--test-weighted-split-paths [TEST_WEIGHTED_SPLIT_PATHS ...]]
4:                        [--train-weighted-split-paths-path TRAIN_WEIGHTED_SPLIT_PATHS_PATH]
4:                        [--valid-weighted-split-paths-path VALID_WEIGHTED_SPLIT_PATHS_PATH]
4:                        [--test-weighted-split-paths-path TEST_WEIGHTED_SPLIT_PATHS_PATH]
4:                        [--log-path LOG_PATH] [--vocab-file VOCAB_FILE]
4:                        [--merge-file MERGE_FILE]
4:                        [--vocab-extra-ids VOCAB_EXTRA_IDS]
4:                        [--seq-length SEQ_LENGTH]
4:                        [--encoder-seq-length ENCODER_SEQ_LENGTH]
4:                        [--decoder-seq-length DECODER_SEQ_LENGTH]
4:                        [--retriever-seq-length RETRIEVER_SEQ_LENGTH]
4:                        [--sample-rate SAMPLE_RATE] [--mask-prob MASK_PROB]
4:                        [--short-seq-prob SHORT_SEQ_PROB] [--mmap-warmup]
4:                        [--num-workers NUM_WORKERS]
4:                        [--valid-num-workers VALID_NUM_WORKERS]
4:                        [--tokenizer-type {BertWordPieceLowerCase,BertWordPieceCase,GPT2BPETokenizer,PretrainedFromHF}]
4:                        [--tokenizer-name-or-path TOKENIZER_NAME_OR_PATH]
4:                        [--data-impl {lazy,cached,mmap,infer}]
4:                        [--reset-position-ids] [--reset-attention-mask]
4:                        [--eod-mask-loss] [--loss-on-targets-only]
4:                        [--reweight-loss-based-on-position-frequency]
4:                        [--noise-density NOISE_DENSITY]
4:                        [--mean-noise-span-length MEAN_NOISE_SPAN_LENGTH]
4:                        [--adlr-autoresume]
4:                        [--adlr-autoresume-interval ADLR_AUTORESUME_INTERVAL]
4:                        [--ict-head-size ICT_HEAD_SIZE]
4:                        [--biencoder-projection-dim BIENCODER_PROJECTION_DIM]
4:                        [--biencoder-shared-query-context-model]
4:                        [--ict-load ICT_LOAD] [--bert-load BERT_LOAD]
4:                        [--titles-data-path TITLES_DATA_PATH]
4:                        [--query-in-block-prob QUERY_IN_BLOCK_PROB]
4:                        [--use-one-sent-docs]
4:                        [--evidence-data-path EVIDENCE_DATA_PATH]
4:                        [--retriever-report-topk-accuracies RETRIEVER_REPORT_TOPK_ACCURACIES [RETRIEVER_REPORT_TOPK_ACCURACIES ...]]
4:                        [--retriever-score-scaling]
4:                        [--block-data-path BLOCK_DATA_PATH]
4:                        [--embedding-path EMBEDDING_PATH]
4:                        [--indexer-batch-size INDEXER_BATCH_SIZE]
4:                        [--indexer-log-interval INDEXER_LOG_INTERVAL]
4:                        [--num-classes NUM_CLASSES] [--img-dim IMG_DIM]
4:                        [--num-channels NUM_CHANNELS] [--patch-dim PATCH_DIM]
4:                        [--log-params-norm] [--log-num-zeros-in-grad]
4:                        [--tensorboard-log-interval TENSORBOARD_LOG_INTERVAL]
4:                        [--tensorboard-queue-size TENSORBOARD_QUEUE_SIZE]
4:                        [--log-timers-to-tensorboard]
4:                        [--log-batch-size-to-tensorboard]
4:                        [--no-log-learnig-rate-to-tensorboard]
4:                        [--no-log-loss-scale-to-tensorboard]
4:                        [--log-validation-ppl-to-tensorboard]
4:                        [--zero-stage ZERO_STAGE] [--zero-reduce-scatter]
4:                        [--zero-contigious-gradients]
4:                        [--zero-reduce-bucket-size ZERO_REDUCE_BUCKET_SIZE]
4:                        [--zero-allgather-bucket-size ZERO_ALLGATHER_BUCKET_SIZE]
4:                        [--remote-device {none,cpu,nvme}] [--use-pin-memory]
4:                        [--scattered-embeddings] [--split-transformers]
4:                        [--memory-centric-tiled-linear]
4:                        [--tile-factor TILE_FACTOR]
4:                        [--deepspeed-activation-checkpointing]
4:                        [--partition-activations] [--contigious-checkpointing]
4:                        [--checkpoint-in-cpu] [--synchronize-each-layer]
4:                        [--profile-backward] [--deepspeed]
4:                        [--deepspeed_config DEEPSPEED_CONFIG] [--deepscale]
4:                        [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
4: pretrain_gpt.py: error: unrecognized arguments: --reset-progress
2: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 68510) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python
6: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 2252) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python
5: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 40052) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python
7: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 58809) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python
0: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 55346) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python
1: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 52416) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python
4: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 120066) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python
3: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 12206) of binary: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/bin/python
3: Traceback (most recent call last):
3:   File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main
3:     return _run_code(code, main_globals, None,
3:   File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code
3:     exec(code, run_globals)
3:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in <module>
2: Traceback (most recent call last):
2:   File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main
3:     main()
3:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
2:     return _run_code(code, main_globals, None,
2:   File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code
2:     exec(code, run_globals)
2:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in <module>
1: Traceback (most recent call last):
1:   File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main
3:     return f(*args, **kwargs)
3:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main
1:     return _run_code(code, main_globals, None,
1:   File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code
1:     exec(code, run_globals)
1:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in <module>
5: Traceback (most recent call last):
5:   File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main
3:     run(args)
3:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run
5:     return _run_code(code, main_globals, None,
5:   File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code
0: Traceback (most recent call last):
0:   File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main
5:     exec(code, run_globals)
5:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in <module>
0:     return _run_code(code, main_globals, None,
0:   File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code
4: Traceback (most recent call last):
4:   File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main
0:     exec(code, run_globals)
0:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in <module>
3:     elastic_launch(
3:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
4:     return _run_code(code, main_globals, None,
4:   File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code
2:     main()
2:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
4:     exec(code, run_globals)
4:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in <module>
1:     main()
1:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
7: Traceback (most recent call last):
7:   File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main
2:     return f(*args, **kwargs)
2:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main
7:     return _run_code(code, main_globals, None,
7:   File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code
7:     exec(code, run_globals)
7:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in <module>
1:     return f(*args, **kwargs)
1:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main
3:     return launch_agent(self._config, self._entrypoint, list(args))
3:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
5:     main()
5:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
3:     raise ChildFailedError(
3: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
3: ============================================================
3: Megatron-DeepSpeed/pretrain_gpt.py FAILED
3: ------------------------------------------------------------
3: Failures:
3: [1]:
3:   time      : 2023-04-24_12:09:29
3:   host      : nid006911
3:   rank      : 25 (local_rank: 1)
3:   exitcode  : 2 (pid: 12207)
3:   error_file: <N/A>
3:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
3: [2]:
3:   time      : 2023-04-24_12:09:29
3:   host      : nid006911
3:   rank      : 26 (local_rank: 2)
3:   exitcode  : 2 (pid: 12208)
3:   error_file: <N/A>
3:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
3: [3]:
3:   time      : 2023-04-24_12:09:29
3:   host      : nid006911
3:   rank      : 27 (local_rank: 3)
3:   exitcode  : 2 (pid: 12209)
3:   error_file: <N/A>
3:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
3: [4]:
3:   time      : 2023-04-24_12:09:29
2:     run(args)
2:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run
0:     main()
0:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
1:     run(args)
1:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run
3:   host      : nid006911
3:   rank      : 28 (local_rank: 4)
3:   exitcode  : 2 (pid: 12210)
3:   error_file: <N/A>
3:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
3: [5]:
3:   time      : 2023-04-24_12:09:29
3:   host      : nid006911
3:   rank      : 29 (local_rank: 5)
3:   exitcode  : 2 (pid: 12211)
3:   error_file: <N/A>
3:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
3: [6]:
3:   time      : 2023-04-24_12:09:29
3:   host      : nid006911
3:   rank      : 30 (local_rank: 6)
3:   exitcode  : 2 (pid: 12212)
3:   error_file: <N/A>
3:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
3: [7]:
3:   time      : 2023-04-24_12:09:29
3:   host      : nid006911
3:   rank      : 31 (local_rank: 7)
3:   exitcode  : 2 (pid: 12213)
3:   error_file: <N/A>
3:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
3: ------------------------------------------------------------
3: Root Cause (first observed failure):
3: [0]:
4:     main()
4:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
5:     return f(*args, **kwargs)
5:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main
3:   time      : 2023-04-24_12:09:29
3:   host      : nid006911
3:   rank      : 24 (local_rank: 0)
3:   exitcode  : 2 (pid: 12206)
3:   error_file: <N/A>
3:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
3: ============================================================
4:     return f(*args, **kwargs)
4:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main
0:     return f(*args, **kwargs)
0:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main
1:     elastic_launch(
1:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
2:     elastic_launch(
2:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
7:     main()
7:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
5:     run(args)
5:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run
7:     return f(*args, **kwargs)
7:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main
4:     run(args)
4:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run
2:     return launch_agent(self._config, self._entrypoint, list(args))
2:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
5:     elastic_launch(
5:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
1:     return launch_agent(self._config, self._entrypoint, list(args))
1:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
6: Traceback (most recent call last):
6:   File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 197, in _run_module_as_main
0:     run(args)
0:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run
4:     elastic_launch(
4:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
6:     return _run_code(code, main_globals, None,
6:   File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/runpy.py", line 87, in _run_code
6:     exec(code, run_globals)
6:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 766, in <module>
1:     raise ChildFailedError(
1: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
1: ============================================================
1: Megatron-DeepSpeed/pretrain_gpt.py FAILED
1: ------------------------------------------------------------
1: Failures:
1: [1]:
1:   time      : 2023-04-24_12:09:29
1:   host      : nid006909
1:   rank      : 9 (local_rank: 1)
1:   exitcode  : 2 (pid: 52417)
1:   error_file: <N/A>
1:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
1: [2]:
1:   time      : 2023-04-24_12:09:29
1:   host      : nid006909
1:   rank      : 10 (local_rank: 2)
1:   exitcode  : 2 (pid: 52418)
1:   error_file: <N/A>
1:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
1: [3]:
1:   time      : 2023-04-24_12:09:29
1:   host      : nid006909
1:   rank      : 11 (local_rank: 3)
1:   exitcode  : 2 (pid: 52419)
1:   error_file: <N/A>
1:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
1: [4]:
1:   time      : 2023-04-24_12:09:29
7:     run(args)
7:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run
2:     raise ChildFailedError(
2: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
2: ============================================================
2: Megatron-DeepSpeed/pretrain_gpt.py FAILED
2: ------------------------------------------------------------
2: Failures:
2: [1]:
2:   time      : 2023-04-24_12:09:29
2:   host      : nid006910
2:   rank      : 17 (local_rank: 1)
2:   exitcode  : 2 (pid: 68511)
2:   error_file: <N/A>
2:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
2: [2]:
2:   time      : 2023-04-24_12:09:29
2:   host      : nid006910
2:   rank      : 18 (local_rank: 2)
2:   exitcode  : 2 (pid: 68512)
2:   error_file: <N/A>
2:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
2: [3]:
2:   time      : 2023-04-24_12:09:29
2:   host      : nid006910
2:   rank      : 19 (local_rank: 3)
2:   exitcode  : 2 (pid: 68513)
2:   error_file: <N/A>
2:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
2: [4]:
2:   time      : 2023-04-24_12:09:29
0:     elastic_launch(
0:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
1:   host      : nid006909
1:   rank      : 12 (local_rank: 4)
1:   exitcode  : 2 (pid: 52420)
1:   error_file: <N/A>
1:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
1: [5]:
1:   time      : 2023-04-24_12:09:29
1:   host      : nid006909
1:   rank      : 13 (local_rank: 5)
1:   exitcode  : 2 (pid: 52421)
1:   error_file: <N/A>
1:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
1: [6]:
1:   time      : 2023-04-24_12:09:29
1:   host      : nid006909
1:   rank      : 14 (local_rank: 6)
1:   exitcode  : 2 (pid: 52422)
1:   error_file: <N/A>
1:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
1: [7]:
1:   time      : 2023-04-24_12:09:29
1:   host      : nid006909
1:   rank      : 15 (local_rank: 7)
1:   exitcode  : 2 (pid: 52423)
1:   error_file: <N/A>
1:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
1: ------------------------------------------------------------
1: Root Cause (first observed failure):
1: [0]:
5:     return launch_agent(self._config, self._entrypoint, list(args))
5:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
2:   host      : nid006910
2:   rank      : 20 (local_rank: 4)
2:   exitcode  : 2 (pid: 68514)
2:   error_file: <N/A>
2:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
2: [5]:
2:   time      : 2023-04-24_12:09:29
2:   host      : nid006910
2:   rank      : 21 (local_rank: 5)
2:   exitcode  : 2 (pid: 68515)
2:   error_file: <N/A>
2:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
2: [6]:
2:   time      : 2023-04-24_12:09:29
2:   host      : nid006910
2:   rank      : 22 (local_rank: 6)
2:   exitcode  : 2 (pid: 68516)
2:   error_file: <N/A>
2:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
2: [7]:
2:   time      : 2023-04-24_12:09:29
2:   host      : nid006910
2:   rank      : 23 (local_rank: 7)
2:   exitcode  : 2 (pid: 68517)
2:   error_file: <N/A>
2:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
2: ------------------------------------------------------------
2: Root Cause (first observed failure):
2: [0]:
1:   time      : 2023-04-24_12:09:29
1:   host      : nid006909
1:   rank      : 8 (local_rank: 0)
1:   exitcode  : 2 (pid: 52416)
1:   error_file: <N/A>
1:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
1: ============================================================
2:   time      : 2023-04-24_12:09:29
2:   host      : nid006910
2:   rank      : 16 (local_rank: 0)
2:   exitcode  : 2 (pid: 68510)
2:   error_file: <N/A>
2:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
2: ============================================================
7:     elastic_launch(
7:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
4:     return launch_agent(self._config, self._entrypoint, list(args))
4:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
0:     return launch_agent(self._config, self._entrypoint, list(args))
0:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
5:     raise ChildFailedError(
5: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
5: ============================================================
5: Megatron-DeepSpeed/pretrain_gpt.py FAILED
5: ------------------------------------------------------------
5: Failures:
5: [1]:
5:   time      : 2023-04-24_12:09:29
5:   host      : nid006913
5:   rank      : 41 (local_rank: 1)
5:   exitcode  : 2 (pid: 40053)
5:   error_file: <N/A>
5:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
5: [2]:
5:   time      : 2023-04-24_12:09:29
5:   host      : nid006913
5:   rank      : 42 (local_rank: 2)
5:   exitcode  : 2 (pid: 40054)
5:   error_file: <N/A>
5:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
5: [3]:
5:   time      : 2023-04-24_12:09:29
5:   host      : nid006913
5:   rank      : 43 (local_rank: 3)
5:   exitcode  : 2 (pid: 40055)
5:   error_file: <N/A>
5:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
5: [4]:
5:   time      : 2023-04-24_12:09:29
4:     raise ChildFailedError(
4: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
4: ============================================================
4: Megatron-DeepSpeed/pretrain_gpt.py FAILED
4: ------------------------------------------------------------
4: Failures:
4: [1]:
4:   time      : 2023-04-24_12:09:29
4:   host      : nid006912
4:   rank      : 33 (local_rank: 1)
4:   exitcode  : 2 (pid: 120067)
4:   error_file: <N/A>
4:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
4: [2]:
4:   time      : 2023-04-24_12:09:29
4:   host      : nid006912
4:   rank      : 34 (local_rank: 2)
4:   exitcode  : 2 (pid: 120068)
4:   error_file: <N/A>
4:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
4: [3]:
4:   time      : 2023-04-24_12:09:29
4:   host      : nid006912
4:   rank      : 35 (local_rank: 3)
4:   exitcode  : 2 (pid: 120069)
4:   error_file: <N/A>
4:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
4: [4]:
4:   time      : 2023-04-24_12:09:29
5:   host      : nid006913
5:   rank      : 44 (local_rank: 4)
5:   exitcode  : 2 (pid: 40056)
5:   error_file: <N/A>
5:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
5: [5]:
5:   time      : 2023-04-24_12:09:29
5:   host      : nid006913
5:   rank      : 45 (local_rank: 5)
5:   exitcode  : 2 (pid: 40057)
5:   error_file: <N/A>
5:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
5: [6]:
5:   time      : 2023-04-24_12:09:29
5:   host      : nid006913
5:   rank      : 46 (local_rank: 6)
5:   exitcode  : 2 (pid: 40058)
5:   error_file: <N/A>
5:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
5: [7]:
5:   time      : 2023-04-24_12:09:29
5:   host      : nid006913
5:   rank      : 47 (local_rank: 7)
5:   exitcode  : 2 (pid: 40059)
5:   error_file: <N/A>
5:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
5: ------------------------------------------------------------
5: Root Cause (first observed failure):
5: [0]:
7:     return launch_agent(self._config, self._entrypoint, list(args))
7:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
4:   host      : nid006912
4:   rank      : 36 (local_rank: 4)
4:   exitcode  : 2 (pid: 120076)
4:   error_file: <N/A>
4:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
4: [5]:
4:   time      : 2023-04-24_12:09:29
4:   host      : nid006912
4:   rank      : 37 (local_rank: 5)
4:   exitcode  : 2 (pid: 120078)
4:   error_file: <N/A>
4:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
4: [6]:
4:   time      : 2023-04-24_12:09:29
4:   host      : nid006912
4:   rank      : 38 (local_rank: 6)
4:   exitcode  : 2 (pid: 120079)
4:   error_file: <N/A>
4:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
4: [7]:
4:   time      : 2023-04-24_12:09:29
4:   host      : nid006912
4:   rank      : 39 (local_rank: 7)
4:   exitcode  : 2 (pid: 120080)
4:   error_file: <N/A>
4:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
4: ------------------------------------------------------------
4: Root Cause (first observed failure):
4: [0]:
5:   time      : 2023-04-24_12:09:29
5:   host      : nid006913
5:   rank      : 40 (local_rank: 0)
5:   exitcode  : 2 (pid: 40052)
5:   error_file: <N/A>
5:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
5: ============================================================
0:     raise ChildFailedError(
0: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
0: ============================================================
0: Megatron-DeepSpeed/pretrain_gpt.py FAILED
0: ------------------------------------------------------------
0: Failures:
0: [1]:
0:   time      : 2023-04-24_12:09:29
0:   host      : nid006908
0:   rank      : 1 (local_rank: 1)
0:   exitcode  : 2 (pid: 55347)
0:   error_file: <N/A>
0:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
0: [2]:
0:   time      : 2023-04-24_12:09:29
0:   host      : nid006908
0:   rank      : 2 (local_rank: 2)
0:   exitcode  : 2 (pid: 55348)
0:   error_file: <N/A>
0:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
0: [3]:
0:   time      : 2023-04-24_12:09:29
0:   host      : nid006908
0:   rank      : 3 (local_rank: 3)
0:   exitcode  : 2 (pid: 55349)
0:   error_file: <N/A>
0:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
0: [4]:
0:   time      : 2023-04-24_12:09:29
4:   time      : 2023-04-24_12:09:29
4:   host      : nid006912
4:   rank      : 32 (local_rank: 0)
4:   exitcode  : 2 (pid: 120066)
4:   error_file: <N/A>
4:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
4: ============================================================
6:     main()
6:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
0:   host      : nid006908
0:   rank      : 4 (local_rank: 4)
0:   exitcode  : 2 (pid: 55350)
0:   error_file: <N/A>
0:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
0: [5]:
0:   time      : 2023-04-24_12:09:29
0:   host      : nid006908
0:   rank      : 5 (local_rank: 5)
0:   exitcode  : 2 (pid: 55351)
0:   error_file: <N/A>
0:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
0: [6]:
0:   time      : 2023-04-24_12:09:29
0:   host      : nid006908
0:   rank      : 6 (local_rank: 6)
0:   exitcode  : 2 (pid: 55352)
0:   error_file: <N/A>
0:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
0: [7]:
0:   time      : 2023-04-24_12:09:29
0:   host      : nid006908
0:   rank      : 7 (local_rank: 7)
0:   exitcode  : 2 (pid: 55353)
0:   error_file: <N/A>
0:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
0: ------------------------------------------------------------
0: Root Cause (first observed failure):
0: [0]:
0:   time      : 2023-04-24_12:09:29
0:   host      : nid006908
0:   rank      : 0 (local_rank: 0)
0:   exitcode  : 2 (pid: 55346)
0:   error_file: <N/A>
0:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
0: ============================================================
7:     raise ChildFailedError(
7: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
7: ============================================================
7: Megatron-DeepSpeed/pretrain_gpt.py FAILED
7: ------------------------------------------------------------
7: Failures:
7: [1]:
7:   time      : 2023-04-24_12:09:29
7:   host      : nid006915
7:   rank      : 57 (local_rank: 1)
7:   exitcode  : 2 (pid: 58810)
7:   error_file: <N/A>
7:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
7: [2]:
7:   time      : 2023-04-24_12:09:29
7:   host      : nid006915
7:   rank      : 58 (local_rank: 2)
7:   exitcode  : 2 (pid: 58811)
7:   error_file: <N/A>
7:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
7: [3]:
7:   time      : 2023-04-24_12:09:29
7:   host      : nid006915
7:   rank      : 59 (local_rank: 3)
7:   exitcode  : 2 (pid: 58812)
7:   error_file: <N/A>
7:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
7: [4]:
7:   time      : 2023-04-24_12:09:29
6:     return f(*args, **kwargs)
6:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main
7:   host      : nid006915
7:   rank      : 60 (local_rank: 4)
7:   exitcode  : 2 (pid: 58813)
7:   error_file: <N/A>
7:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
7: [5]:
7:   time      : 2023-04-24_12:09:29
7:   host      : nid006915
7:   rank      : 61 (local_rank: 5)
7:   exitcode  : 2 (pid: 58814)
7:   error_file: <N/A>
7:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
7: [6]:
7:   time      : 2023-04-24_12:09:29
7:   host      : nid006915
7:   rank      : 62 (local_rank: 6)
7:   exitcode  : 2 (pid: 58815)
7:   error_file: <N/A>
7:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
7: [7]:
7:   time      : 2023-04-24_12:09:29
7:   host      : nid006915
7:   rank      : 63 (local_rank: 7)
7:   exitcode  : 2 (pid: 58816)
7:   error_file: <N/A>
7:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
7: ------------------------------------------------------------
7: Root Cause (first observed failure):
7: [0]:
7:   time      : 2023-04-24_12:09:29
7:   host      : nid006915
7:   rank      : 56 (local_rank: 0)
7:   exitcode  : 2 (pid: 58809)
7:   error_file: <N/A>
7:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
7: ============================================================
6:     run(args)
6:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run
6:     elastic_launch(
6:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
6:     return launch_agent(self._config, self._entrypoint, list(args))
6:   File "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
6:     raise ChildFailedError(
6: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
6: ============================================================
6: Megatron-DeepSpeed/pretrain_gpt.py FAILED
6: ------------------------------------------------------------
6: Failures:
6: [1]:
6:   time      : 2023-04-24_12:09:29
6:   host      : nid006914
6:   rank      : 49 (local_rank: 1)
6:   exitcode  : 2 (pid: 2253)
6:   error_file: <N/A>
6:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
6: [2]:
6:   time      : 2023-04-24_12:09:29
6:   host      : nid006914
6:   rank      : 50 (local_rank: 2)
6:   exitcode  : 2 (pid: 2254)
6:   error_file: <N/A>
6:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
6: [3]:
6:   time      : 2023-04-24_12:09:29
6:   host      : nid006914
6:   rank      : 51 (local_rank: 3)
6:   exitcode  : 2 (pid: 2255)
6:   error_file: <N/A>
6:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
6: [4]:
6:   time      : 2023-04-24_12:09:29
6:   host      : nid006914
6:   rank      : 52 (local_rank: 4)
6:   exitcode  : 2 (pid: 2256)
6:   error_file: <N/A>
6:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
6: [5]:
6:   time      : 2023-04-24_12:09:29
6:   host      : nid006914
6:   rank      : 53 (local_rank: 5)
6:   exitcode  : 2 (pid: 2258)
6:   error_file: <N/A>
6:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
6: [6]:
6:   time      : 2023-04-24_12:09:29
6:   host      : nid006914
6:   rank      : 54 (local_rank: 6)
6:   exitcode  : 2 (pid: 2259)
6:   error_file: <N/A>
6:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
6: [7]:
6:   time      : 2023-04-24_12:09:29
6:   host      : nid006914
6:   rank      : 55 (local_rank: 7)
6:   exitcode  : 2 (pid: 2260)
6:   error_file: <N/A>
6:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
6: ------------------------------------------------------------
6: Root Cause (first observed failure):
6: [0]:
6:   time      : 2023-04-24_12:09:29
6:   host      : nid006914
6:   rank      : 48 (local_rank: 0)
6:   exitcode  : 2 (pid: 2252)
6:   error_file: <N/A>
6:   traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
6: ============================================================
srun: error: nid006915: task 7: Exited with exit code 1
srun: launch/slurm: _step_signal: Terminating StepId=3406547.0
srun: error: nid006911: task 3: Exited with exit code 1
srun: error: nid006912: task 4: Exited with exit code 1
srun: error: nid006909: task 1: Exited with exit code 1
srun: error: nid006913: task 5: Exited with exit code 1
0: slurmstepd: error: *** STEP 3406547.0 ON nid006908 CANCELLED AT 2023-04-24T12:09:30 ***
srun: error: nid006910: task 2: Exited with exit code 1
srun: error: nid006914: task 6: Exited with exit code 1
srun: error: nid006908: task 0: Terminated
srun: Force Terminated StepId=3406547.0