JayKimDevolved's picture
JayKimDevolved/deepseek
c011401 verified
raw
history blame
2.78 kB
Raylet is terminated. Termination is unexpected. Possible reasons include: (1) SIGKILL by the user or system OOM killer, (2) Invalid memory access from Raylet causing SIGSEGV or SIGBUS, (3) Other termination signals. Last 20 lines of the Raylet logs:
[2025-01-15 18:16:28,261 I 522173 522173] (raylet) main.cc:258: Shutting down...
[2025-01-15 18:16:28,261 I 522173 522173] (raylet) accessor.cc:510: Unregistering node node_id=a15143d135fdf90e02f60919a4290e512d91060e1590c2c0978ed15e
[2025-01-15 18:16:28,263 I 522173 522173] (raylet) accessor.cc:762: Received notification for node, IsAlive = 0 node_id=a15143d135fdf90e02f60919a4290e512d91060e1590c2c0978ed15e
[2025-01-15 18:16:28,297 C 522173 522173] (raylet) node_manager.cc:1043: [Timeout] Exiting because this node manager has mistakenly been marked as dead by the GCS: GCS failed to check the health of this node for 5 times. This is likely because the machine or raylet has become overloaded.
*** StackTrace Information ***
/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0xbdf73a) [0x55cefe38d73a] ray::operator<<()
/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0xbe1b21) [0x55cefe38fb21] ray::RayLog::~RayLog()
/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x323299) [0x55cefdad1299] ray::raylet::NodeManager::NodeRemoved()
/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x536e69) [0x55cefdce4e69] ray::gcs::NodeInfoAccessor::HandleNotification()
/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x669e98) [0x55cefde17e98] EventTracker::RecordExecution()
/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x664e8e) [0x55cefde12e8e] std::_Function_handler<>::_M_invoke()
/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x665306) [0x55cefde13306] boost::asio::detail::completion_handler<>::do_complete()
/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0xc53f9b) [0x55cefe401f9b] boost::asio::detail::scheduler::do_run_one()
/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0xc56529) [0x55cefe404529] boost::asio::detail::scheduler::run()
/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0xc56a42) [0x55cefe404a42] boost::asio::io_context::run()
/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x1e9155) [0x55cefd997155] main
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fc9b614fd90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7fc9b614fe40] __libc_start_main
/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x243277) [0x55cefd9f1277]