|
[2025-01-15 18:16:19,955 I 522173 522173] (raylet) main.cc:180: Setting cluster ID to: fbcacd8c56c8a248301e2268894a7704d9f9a83b5100964a45abdcd8 |
|
[2025-01-15 18:16:19,964 I 522173 522173] (raylet) main.cc:289: Raylet is not set to kill unknown children. |
|
[2025-01-15 18:16:19,964 I 522173 522173] (raylet) io_service_pool.cc:35: IOServicePool is running with 1 io_service. |
|
[2025-01-15 18:16:19,964 I 522173 522173] (raylet) main.cc:419: Setting node ID node_id=a15143d135fdf90e02f60919a4290e512d91060e1590c2c0978ed15e |
|
[2025-01-15 18:16:19,965 I 522173 522173] (raylet) store_runner.cc:32: Allowing the Plasma store to use up to 2.14748GB of memory. |
|
[2025-01-15 18:16:19,965 I 522173 522173] (raylet) store_runner.cc:48: Starting object store with directory /dev/shm, fallback /tmp/ray, and huge page support disabled |
|
[2025-01-15 18:16:19,965 I 522173 522202] (raylet) dlmalloc.cc:154: create_and_mmap_buffer(2147483656, /dev/shm/plasmaXXXXXX) |
|
[2025-01-15 18:16:19,967 I 522173 522202] (raylet) store.cc:564: Plasma store debug dump: |
|
Current usage: 0 / 2.14748 GB |
|
- num bytes created total: 0 |
|
0 pending objects of total size 0MB |
|
- objects spillable: 0 |
|
- bytes spillable: 0 |
|
- objects unsealed: 0 |
|
- bytes unsealed: 0 |
|
- objects in use: 0 |
|
- bytes in use: 0 |
|
- objects evictable: 0 |
|
- bytes evictable: 0 |
|
|
|
- objects created by worker: 0 |
|
- bytes created by worker: 0 |
|
- objects restored: 0 |
|
- bytes restored: 0 |
|
- objects received: 0 |
|
- bytes received: 0 |
|
- objects errored: 0 |
|
- bytes errored: 0 |
|
|
|
[2025-01-15 18:16:20,970 I 522173 522173] (raylet) grpc_server.cc:134: ObjectManager server started, listening on port 38501. |
|
[2025-01-15 18:16:20,974 I 522173 522173] (raylet) worker_killing_policy.cc:101: Running GroupByOwner policy. |
|
[2025-01-15 18:16:20,974 I 522173 522173] (raylet) memory_monitor.cc:47: MemoryMonitor initialized with usage threshold at 94999994368 bytes (0.95 system memory), total system memory bytes: 99999997952 |
|
[2025-01-15 18:16:20,974 I 522173 522173] (raylet) node_manager.cc:287: Initializing NodeManager node_id=a15143d135fdf90e02f60919a4290e512d91060e1590c2c0978ed15e |
|
[2025-01-15 18:16:20,976 I 522173 522173] (raylet) grpc_server.cc:134: NodeManager server started, listening on port 44855. |
|
[2025-01-15 18:16:20,984 I 522173 522266] (raylet) agent_manager.cc:77: Monitor agent process with name dashboard_agent/424238335 |
|
[2025-01-15 18:16:20,985 I 522173 522268] (raylet) agent_manager.cc:77: Monitor agent process with name runtime_env_agent |
|
[2025-01-15 18:16:20,985 I 522173 522173] (raylet) event.cc:493: Ray Event initialized for RAYLET |
|
[2025-01-15 18:16:20,985 I 522173 522173] (raylet) event.cc:324: Set ray event level to warning |
|
[2025-01-15 18:16:20,987 I 522173 522173] (raylet) raylet.cc:134: Raylet of id, a15143d135fdf90e02f60919a4290e512d91060e1590c2c0978ed15e started. Raylet consists of node_manager and object_manager. node_manager address: 192.168.0.2:44855 object_manager address: 192.168.0.2:38501 hostname: 0cd925b1f73b |
|
[2025-01-15 18:16:20,990 I 522173 522173] (raylet) node_manager.cc:525: [state-dump] NodeManager: |
|
[state-dump] Node ID: a15143d135fdf90e02f60919a4290e512d91060e1590c2c0978ed15e |
|
[state-dump] Node name: 192.168.0.2 |
|
[state-dump] InitialConfigResources: {object_store_memory: 21474836480000, node:192.168.0.2: 10000, memory: 864509526020000, accelerator_type:A40: 10000, node:__internal_head__: 10000, GPU: 20000, CPU: 200000} |
|
[state-dump] ClusterTaskManager: |
|
[state-dump] ========== Node: a15143d135fdf90e02f60919a4290e512d91060e1590c2c0978ed15e ================= |
|
[state-dump] Infeasible queue length: 0 |
|
[state-dump] Schedule queue length: 0 |
|
[state-dump] Dispatch queue length: 0 |
|
[state-dump] num_waiting_for_resource: 0 |
|
[state-dump] num_waiting_for_plasma_memory: 0 |
|
[state-dump] num_waiting_for_remote_node_resources: 0 |
|
[state-dump] num_worker_not_started_by_job_config_not_exist: 0 |
|
[state-dump] num_worker_not_started_by_registration_timeout: 0 |
|
[state-dump] num_tasks_waiting_for_workers: 0 |
|
[state-dump] num_cancelled_tasks: 0 |
|
[state-dump] cluster_resource_scheduler state: |
|
[state-dump] Local id: -1801459830053656226 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [864509526020000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [864509526020000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"a15143d135fdf90e02f60919a4290e512d91060e1590c2c0978ed15e",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1801459830053656226{"total":{GPU: 20000, object_store_memory: 21474836480000, memory: 864509526020000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000, CPU: 200000}}, "available": {GPU: 20000, object_store_memory: 21474836480000, memory: 864509526020000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"a15143d135fdf90e02f60919a4290e512d91060e1590c2c0978ed15e",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} |
|
[state-dump] Waiting tasks size: 0 |
|
[state-dump] Number of executing tasks: 0 |
|
[state-dump] Number of pinned task arguments: 0 |
|
[state-dump] Number of total spilled tasks: 0 |
|
[state-dump] Number of spilled waiting tasks: 0 |
|
[state-dump] Number of spilled unschedulable tasks: 0 |
|
[state-dump] Resource usage { |
|
[state-dump] } |
|
[state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: |
|
[state-dump] |
|
[state-dump] Running tasks by scheduling class: |
|
[state-dump] ================================================== |
|
[state-dump] |
|
[state-dump] ClusterResources: |
|
[state-dump] LocalObjectManager: |
|
[state-dump] - num pinned objects: 0 |
|
[state-dump] - pinned objects size: 0 |
|
[state-dump] - num objects pending restore: 0 |
|
[state-dump] - num objects pending spill: 0 |
|
[state-dump] - num bytes pending spill: 0 |
|
[state-dump] - num bytes currently spilled: 0 |
|
[state-dump] - cumulative spill requests: 0 |
|
[state-dump] - cumulative restore requests: 0 |
|
[state-dump] - spilled objects pending delete: 0 |
|
[state-dump] |
|
[state-dump] ObjectManager: |
|
[state-dump] - num local objects: 0 |
|
[state-dump] - num unfulfilled push requests: 0 |
|
[state-dump] - num object pull requests: 0 |
|
[state-dump] - num chunks received total: 0 |
|
[state-dump] - num chunks received failed (all): 0 |
|
[state-dump] - num chunks received failed / cancelled: 0 |
|
[state-dump] - num chunks received failed / plasma error: 0 |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 0 total (0 active) |
|
[state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] Execution time: mean = -nan s, total = 0.000 s |
|
[state-dump] Event stats: |
|
[state-dump] PushManager: |
|
[state-dump] - num pushes in flight: 0 |
|
[state-dump] - num chunks in flight: 0 |
|
[state-dump] - num chunks remaining: 0 |
|
[state-dump] - max chunks allowed: 409 |
|
[state-dump] OwnershipBasedObjectDirectory: |
|
[state-dump] - num listeners: 0 |
|
[state-dump] - cumulative location updates: 0 |
|
[state-dump] - num location updates per second: 70252157607644000.000 |
|
[state-dump] - num location lookups per second: 70252157607632000.000 |
|
[state-dump] - num locations added per second: 0.000 |
|
[state-dump] - num locations removed per second: 0.000 |
|
[state-dump] BufferPool: |
|
[state-dump] - create buffer state map size: 0 |
|
[state-dump] PullManager: |
|
[state-dump] - num bytes available for pulled objects: 2147483648 |
|
[state-dump] - num bytes being pulled (all): 0 |
|
[state-dump] - num bytes being pulled / pinned: 0 |
|
[state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - first get request bundle: N/A |
|
[state-dump] - first wait request bundle: N/A |
|
[state-dump] - first task request bundle: N/A |
|
[state-dump] - num objects queued: 0 |
|
[state-dump] - num objects actively pulled (all): 0 |
|
[state-dump] - num objects actively pulled / pinned: 0 |
|
[state-dump] - num bundles being pulled: 0 |
|
[state-dump] - num pull retries: 0 |
|
[state-dump] - max timeout seconds: 0 |
|
[state-dump] - max timeout request is already processed. No entry. |
|
[state-dump] |
|
[state-dump] WorkerPool: |
|
[state-dump] - registered jobs: 0 |
|
[state-dump] - process_failed_job_config_missing: 0 |
|
[state-dump] - process_failed_rate_limited: 0 |
|
[state-dump] - process_failed_pending_registration: 0 |
|
[state-dump] - process_failed_runtime_env_setup_failed: 0 |
|
[state-dump] - num PYTHON workers: 0 |
|
[state-dump] - num PYTHON drivers: 0 |
|
[state-dump] - num PYTHON pending start requests: 0 |
|
[state-dump] - num PYTHON pending registration requests: 0 |
|
[state-dump] - num object spill callbacks queued: 0 |
|
[state-dump] - num object restore queued: 0 |
|
[state-dump] - num util functions queued: 0 |
|
[state-dump] - num idle workers: 0 |
|
[state-dump] TaskDependencyManager: |
|
[state-dump] - task deps map size: 0 |
|
[state-dump] - get req map size: 0 |
|
[state-dump] - wait req map size: 0 |
|
[state-dump] - local objects map size: 0 |
|
[state-dump] WaitManager: |
|
[state-dump] - num active wait requests: 0 |
|
[state-dump] Subscriber: |
|
[state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_REF_REMOVED_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_OBJECT_EVICTION |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] num async plasma notifications: 0 |
|
[state-dump] Remote node managers: |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 28 total (13 active) |
|
[state-dump] Queueing time: mean = 1.419 ms, max = 10.984 ms, min = 25.082 us, total = 39.727 ms |
|
[state-dump] Execution time: mean = 36.753 ms, total = 1.029 s |
|
[state-dump] Event stats: |
|
[state-dump] PeriodicalRunner.RunFnPeriodically - 11 total (2 active, 1 running), Execution time: mean = 164.545 us, total = 1.810 ms, Queueing time: mean = 3.584 ms, max = 10.984 ms, min = 25.082 us, total = 39.427 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.384 ms, total = 2.384 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.ScheduleAndDispatchTasks - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.deadline_timer.flush_free_objects - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.677 ms, total = 1.677 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.deadline_timer.record_metrics - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ClusterResourceManager.ResetRemoteNodeView - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 298.400 us, total = 298.400 us, Queueing time: mean = 102.714 us, max = 102.714 us, min = 102.714 us, total = 102.714 us |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 1 total (0 active), Execution time: mean = 1.259 ms, total = 1.259 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 77.830 us, max = 77.830 us, min = 77.830 us, total = 77.830 us |
|
[state-dump] ObjectManager.UpdateAvailableMemory - 1 total (0 active), Execution time: mean = 4.612 us, total = 4.612 us, Queueing time: mean = 120.066 us, max = 120.066 us, min = 120.066 us, total = 120.066 us |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.deadline_timer.debug_state_dump - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] DebugString() time ms: 0 |
|
[state-dump] |
|
[state-dump] |
|
[2025-01-15 18:16:20,991 I 522173 522173] (raylet) accessor.cc:762: Received notification for node, IsAlive = 1 node_id=a15143d135fdf90e02f60919a4290e512d91060e1590c2c0978ed15e |
|
[2025-01-15 18:16:21,055 I 522173 522173] (raylet) worker_pool.cc:501: Started worker process with pid 522305, the token is 0 |
|
[2025-01-15 18:16:21,059 I 522173 522173] (raylet) worker_pool.cc:501: Started worker process with pid 522306, the token is 1 |
|
[2025-01-15 18:16:21,061 I 522173 522173] (raylet) worker_pool.cc:501: Started worker process with pid 522307, the token is 2 |
|
[2025-01-15 18:16:21,063 I 522173 522173] (raylet) worker_pool.cc:501: Started worker process with pid 522308, the token is 3 |
|
[2025-01-15 18:16:21,065 I 522173 522173] (raylet) worker_pool.cc:501: Started worker process with pid 522309, the token is 4 |
|
[2025-01-15 18:16:21,067 I 522173 522173] (raylet) worker_pool.cc:501: Started worker process with pid 522310, the token is 5 |
|
[2025-01-15 18:16:21,069 I 522173 522173] (raylet) worker_pool.cc:501: Started worker process with pid 522311, the token is 6 |
|
[2025-01-15 18:16:21,072 I 522173 522173] (raylet) worker_pool.cc:501: Started worker process with pid 522312, the token is 7 |
|
[2025-01-15 18:16:21,074 I 522173 522173] (raylet) worker_pool.cc:501: Started worker process with pid 522313, the token is 8 |
|
[2025-01-15 18:16:21,076 I 522173 522173] (raylet) worker_pool.cc:501: Started worker process with pid 522314, the token is 9 |
|
[2025-01-15 18:16:21,078 I 522173 522173] (raylet) worker_pool.cc:501: Started worker process with pid 522315, the token is 10 |
|
[2025-01-15 18:16:21,080 I 522173 522173] (raylet) worker_pool.cc:501: Started worker process with pid 522316, the token is 11 |
|
[2025-01-15 18:16:21,082 I 522173 522173] (raylet) worker_pool.cc:501: Started worker process with pid 522317, the token is 12 |
|
[2025-01-15 18:16:21,084 I 522173 522173] (raylet) worker_pool.cc:501: Started worker process with pid 522318, the token is 13 |
|
[2025-01-15 18:16:21,085 I 522173 522173] (raylet) worker_pool.cc:501: Started worker process with pid 522319, the token is 14 |
|
[2025-01-15 18:16:21,087 I 522173 522173] (raylet) worker_pool.cc:501: Started worker process with pid 522320, the token is 15 |
|
[2025-01-15 18:16:21,089 I 522173 522173] (raylet) worker_pool.cc:501: Started worker process with pid 522321, the token is 16 |
|
[2025-01-15 18:16:21,091 I 522173 522173] (raylet) worker_pool.cc:501: Started worker process with pid 522322, the token is 17 |
|
[2025-01-15 18:16:21,094 I 522173 522173] (raylet) worker_pool.cc:501: Started worker process with pid 522323, the token is 18 |
|
[2025-01-15 18:16:21,096 I 522173 522173] (raylet) worker_pool.cc:501: Started worker process with pid 522324, the token is 19 |
|
[2025-01-15 18:16:21,757 I 522173 522202] (raylet) object_store.cc:35: Object store current usage 8e-09 / 2.14748 GB. |
|
[2025-01-15 18:16:21,929 I 522173 522173] (raylet) worker_pool.cc:692: Job 01000000 already started in worker pool. |
|
[2025-01-15 18:16:22,696 I 522173 522173] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=3, has creation task exception = false |
|
[2025-01-15 18:16:23,037 I 522173 522173] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false |
|
[2025-01-15 18:16:23,038 I 522173 522173] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false |
|
[2025-01-15 18:16:23,038 I 522173 522173] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false |
|
[2025-01-15 18:16:23,038 I 522173 522173] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false |
|
[2025-01-15 18:16:23,039 I 522173 522173] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false |
|
[2025-01-15 18:16:23,039 I 522173 522173] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false |
|
[2025-01-15 18:16:23,040 I 522173 522173] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false |
|
[2025-01-15 18:16:23,040 I 522173 522173] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false |
|
[2025-01-15 18:16:23,040 I 522173 522173] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false |
|
[2025-01-15 18:16:23,044 I 522173 522173] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false |
|
[2025-01-15 18:16:23,045 I 522173 522173] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false |
|
[2025-01-15 18:16:23,046 I 522173 522173] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false |
|
[2025-01-15 18:16:23,046 I 522173 522173] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false |
|
[2025-01-15 18:16:23,046 I 522173 522173] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false |
|
[2025-01-15 18:16:23,046 I 522173 522173] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false |
|
[2025-01-15 18:16:23,047 I 522173 522173] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false |
|
[2025-01-15 18:16:23,048 I 522173 522173] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false |
|
[2025-01-15 18:16:23,048 I 522173 522173] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false |
|
[2025-01-15 18:16:23,459 I 522173 522173] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=3, has creation task exception = false |
|
[2025-01-15 18:16:23,468 I 522173 522173] (raylet) worker_pool.cc:501: Started worker process with pid 524007, the token is 20 |
|
[2025-01-15 18:16:24,649 I 522173 522173] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=3, has creation task exception = false |
|
[2025-01-15 18:16:24,659 I 522173 522173] (raylet) worker_pool.cc:501: Started worker process with pid 524108, the token is 21 |
|
[2025-01-15 18:16:25,805 I 522173 522173] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=3, has creation task exception = false |
|
[2025-01-15 18:16:25,815 I 522173 522173] (raylet) worker_pool.cc:501: Started worker process with pid 524209, the token is 22 |
|
[2025-01-15 18:16:27,006 I 522173 522173] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=3, has creation task exception = false |
|
[2025-01-15 18:16:27,014 I 522173 522173] (raylet) worker_pool.cc:501: Started worker process with pid 524310, the token is 23 |
|
[2025-01-15 18:16:28,206 I 522173 522173] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=3, has creation task exception = false |
|
[2025-01-15 18:16:28,229 I 522173 522173] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=3, has creation task exception = false |
|
[2025-01-15 18:16:28,230 I 522173 522173] (raylet) node_manager.cc:1586: Driver (pid=521907) is disconnected. worker_id=01000000ffffffffffffffffffffffffffffffffffffffffffffffff job_id=01000000 |
|
[2025-01-15 18:16:28,233 I 522173 522173] (raylet) worker_pool.cc:692: Job 01000000 already started in worker pool. |
|
[2025-01-15 18:16:28,261 I 522173 522173] (raylet) main.cc:454: received SIGTERM. Existing local drain request = None |
|
[2025-01-15 18:16:28,261 I 522173 522173] (raylet) main.cc:255: Raylet graceful shutdown triggered, reason = EXPECTED_TERMINATION, reason message = received SIGTERM |
|
[2025-01-15 18:16:28,261 I 522173 522173] (raylet) main.cc:258: Shutting down... |
|
[2025-01-15 18:16:28,261 I 522173 522173] (raylet) accessor.cc:510: Unregistering node node_id=a15143d135fdf90e02f60919a4290e512d91060e1590c2c0978ed15e |
|
[2025-01-15 18:16:28,263 I 522173 522173] (raylet) accessor.cc:762: Received notification for node, IsAlive = 0 node_id=a15143d135fdf90e02f60919a4290e512d91060e1590c2c0978ed15e |
|
[2025-01-15 18:16:28,297 C 522173 522173] (raylet) node_manager.cc:1043: [Timeout] Exiting because this node manager has mistakenly been marked as dead by the GCS: GCS failed to check the health of this node for 5 times. This is likely because the machine or raylet has become overloaded. |
|
*** StackTrace Information *** |
|
/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0xbdf73a) [0x55cefe38d73a] ray::operator<<() |
|
/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0xbe1b21) [0x55cefe38fb21] ray::RayLog::~RayLog() |
|
/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x323299) [0x55cefdad1299] ray::raylet::NodeManager::NodeRemoved() |
|
/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x536e69) [0x55cefdce4e69] ray::gcs::NodeInfoAccessor::HandleNotification() |
|
/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x669e98) [0x55cefde17e98] EventTracker::RecordExecution() |
|
/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x664e8e) [0x55cefde12e8e] std::_Function_handler<>::_M_invoke() |
|
/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x665306) [0x55cefde13306] boost::asio::detail::completion_handler<>::do_complete() |
|
/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0xc53f9b) [0x55cefe401f9b] boost::asio::detail::scheduler::do_run_one() |
|
/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0xc56529) [0x55cefe404529] boost::asio::detail::scheduler::run() |
|
/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0xc56a42) [0x55cefe404a42] boost::asio::io_context::run() |
|
/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x1e9155) [0x55cefd997155] main |
|
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fc9b614fd90] |
|
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7fc9b614fe40] __libc_start_main |
|
/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x243277) [0x55cefd9f1277] |
|
|
|
|