Raylet is terminated. Termination is unexpected. Possible reasons include: (1) SIGKILL by the user or system OOM killer, (2) Invalid memory access from Raylet causing SIGSEGV or SIGBUS, (3) Other termination signals. Last 20 lines of the Raylet logs: [2025-01-15 22:09:24,253 I 14019 14019] (raylet) main.cc:258: Shutting down... [2025-01-15 22:09:24,253 I 14019 14019] (raylet) accessor.cc:510: Unregistering node node_id=8c1933048df819b7d290635b4879245abb3bf91c2ebe5860747d648a [2025-01-15 22:09:24,256 I 14019 14019] (raylet) accessor.cc:762: Received notification for node, IsAlive = 0 node_id=8c1933048df819b7d290635b4879245abb3bf91c2ebe5860747d648a [2025-01-15 22:09:24,293 C 14019 14019] (raylet) node_manager.cc:1043: [Timeout] Exiting because this node manager has mistakenly been marked as dead by the GCS: GCS failed to check the health of this node for 5 times. This is likely because the machine or raylet has become overloaded. *** StackTrace Information *** /usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0xbdf73a) [0x55f20d06173a] ray::operator<<() /usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0xbe1b21) [0x55f20d063b21] ray::RayLog::~RayLog() /usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x323299) [0x55f20c7a5299] ray::raylet::NodeManager::NodeRemoved() /usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x536e69) [0x55f20c9b8e69] ray::gcs::NodeInfoAccessor::HandleNotification() /usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x669e98) [0x55f20caebe98] EventTracker::RecordExecution() /usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x664e8e) [0x55f20cae6e8e] std::_Function_handler<>::_M_invoke() /usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x665306) [0x55f20cae7306] boost::asio::detail::completion_handler<>::do_complete() /usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0xc53f9b) [0x55f20d0d5f9b] boost::asio::detail::scheduler::do_run_one() /usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0xc56529) [0x55f20d0d8529] boost::asio::detail::scheduler::run() /usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0xc56a42) [0x55f20d0d8a42] boost::asio::io_context::run() /usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x1e9155) [0x55f20c66b155] main /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f3cf7e48d90] /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f3cf7e48e40] __libc_start_main /usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x243277) [0x55f20c6c5277] Failed to publish error: Raylet is terminated. Termination is unexpected. Possible reasons include: (1) SIGKILL by the user or system OOM killer, (2) Invalid memory access from Raylet causing SIGSEGV or SIGBUS, (3) Other termination signals. Last 20 lines of the Raylet logs: [2025-01-15 22:09:24,253 I 14019 14019] (raylet) main.cc:258: Shutting down... [2025-01-15 22:09:24,253 I 14019 14019] (raylet) accessor.cc:510: Unregistering node node_id=8c1933048df819b7d290635b4879245abb3bf91c2ebe5860747d648a [2025-01-15 22:09:24,256 I 14019 14019] (raylet) accessor.cc:762: Received notification for node, IsAlive = 0 node_id=8c1933048df819b7d290635b4879245abb3bf91c2ebe5860747d648a [2025-01-15 22:09:24,293 C 14019 14019] (raylet) node_manager.cc:1043: [Timeout] Exiting because this node manager has mistakenly been marked as dead by the GCS: GCS failed to check the health of this node for 5 times. This is likely because the machine or raylet has become overloaded. *** StackTrace Information *** /usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0xbdf73a) [0x55f20d06173a] ray::operator<<() /usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0xbe1b21) [0x55f20d063b21] ray::RayLog::~RayLog() /usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x323299) [0x55f20c7a5299] ray::raylet::NodeManager::NodeRemoved() /usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x536e69) [0x55f20c9b8e69] ray::gcs::NodeInfoAccessor::HandleNotification() /usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x669e98) [0x55f20caebe98] EventTracker::RecordExecution() /usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x664e8e) [0x55f20cae6e8e] std::_Function_handler<>::_M_invoke() /usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x665306) [0x55f20cae7306] boost::asio::detail::completion_handler<>::do_complete() /usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0xc53f9b) [0x55f20d0d5f9b] boost::asio::detail::scheduler::do_run_one() /usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0xc56529) [0x55f20d0d8529] boost::asio::detail::scheduler::run() /usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0xc56a42) [0x55f20d0d8a42] boost::asio::io_context::run() /usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x1e9155) [0x55f20c66b155] main /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f3cf7e48d90] /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f3cf7e48e40] __libc_start_main /usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x243277) [0x55f20c6c5277] [type raylet_died] Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/ray/_private/utils.py", line 207, in publish_error_to_driver gcs_publisher.publish_error( File "python/ray/_raylet.pyx", line 3099, in ray._raylet.GcsPublisher.publish_error File "python/ray/includes/common.pxi", line 81, in ray._raylet.check_status ray.exceptions.GetTimeoutError: Failed to publish after retries: failed to connect to all addresses; last error: UNKNOWN: ipv4:192.168.0.2:55632: Failed to connect to remote host: Connection refused