File size: 23,483 Bytes
c011401 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 |
[2025-01-15 18:15:45,507 I 517589 517589] (raylet) main.cc:180: Setting cluster ID to: c36f90a03eb214af71608b721c24e70055c82cf4a8c1f87ce389b92c
[2025-01-15 18:15:45,516 I 517589 517589] (raylet) main.cc:289: Raylet is not set to kill unknown children.
[2025-01-15 18:15:45,516 I 517589 517589] (raylet) io_service_pool.cc:35: IOServicePool is running with 1 io_service.
[2025-01-15 18:15:45,517 I 517589 517589] (raylet) main.cc:419: Setting node ID node_id=594aea7169520c22e97f1719928454f2113460e3e5c4982ab417b9be
[2025-01-15 18:15:45,517 I 517589 517589] (raylet) store_runner.cc:32: Allowing the Plasma store to use up to 2.14748GB of memory.
[2025-01-15 18:15:45,517 I 517589 517589] (raylet) store_runner.cc:48: Starting object store with directory /dev/shm, fallback /tmp/ray, and huge page support disabled
[2025-01-15 18:15:45,518 I 517589 517618] (raylet) dlmalloc.cc:154: create_and_mmap_buffer(2147483656, /dev/shm/plasmaXXXXXX)
[2025-01-15 18:15:45,519 I 517589 517618] (raylet) store.cc:564: Plasma store debug dump:
Current usage: 0 / 2.14748 GB
- num bytes created total: 0
0 pending objects of total size 0MB
- objects spillable: 0
- bytes spillable: 0
- objects unsealed: 0
- bytes unsealed: 0
- objects in use: 0
- bytes in use: 0
- objects evictable: 0
- bytes evictable: 0
- objects created by worker: 0
- bytes created by worker: 0
- objects restored: 0
- bytes restored: 0
- objects received: 0
- bytes received: 0
- objects errored: 0
- bytes errored: 0
[2025-01-15 18:15:46,523 I 517589 517589] (raylet) grpc_server.cc:134: ObjectManager server started, listening on port 33173.
[2025-01-15 18:15:46,526 I 517589 517589] (raylet) worker_killing_policy.cc:101: Running GroupByOwner policy.
[2025-01-15 18:15:46,527 I 517589 517589] (raylet) memory_monitor.cc:47: MemoryMonitor initialized with usage threshold at 94999994368 bytes (0.95 system memory), total system memory bytes: 99999997952
[2025-01-15 18:15:46,527 I 517589 517589] (raylet) node_manager.cc:287: Initializing NodeManager node_id=594aea7169520c22e97f1719928454f2113460e3e5c4982ab417b9be
[2025-01-15 18:15:46,528 I 517589 517589] (raylet) grpc_server.cc:134: NodeManager server started, listening on port 39451.
[2025-01-15 18:15:46,537 I 517589 517682] (raylet) agent_manager.cc:77: Monitor agent process with name dashboard_agent/424238335
[2025-01-15 18:15:46,538 I 517589 517684] (raylet) agent_manager.cc:77: Monitor agent process with name runtime_env_agent
[2025-01-15 18:15:46,538 I 517589 517589] (raylet) event.cc:493: Ray Event initialized for RAYLET
[2025-01-15 18:15:46,538 I 517589 517589] (raylet) event.cc:324: Set ray event level to warning
[2025-01-15 18:15:46,540 I 517589 517589] (raylet) raylet.cc:134: Raylet of id, 594aea7169520c22e97f1719928454f2113460e3e5c4982ab417b9be started. Raylet consists of node_manager and object_manager. node_manager address: 192.168.0.2:39451 object_manager address: 192.168.0.2:33173 hostname: 0cd925b1f73b
[2025-01-15 18:15:46,543 I 517589 517589] (raylet) node_manager.cc:525: [state-dump] NodeManager:
[state-dump] Node ID: 594aea7169520c22e97f1719928454f2113460e3e5c4982ab417b9be
[state-dump] Node name: 192.168.0.2
[state-dump] InitialConfigResources: {node:192.168.0.2: 10000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 864744902660000, object_store_memory: 21474836480000, CPU: 200000, GPU: 20000}
[state-dump] ClusterTaskManager:
[state-dump] ========== Node: 594aea7169520c22e97f1719928454f2113460e3e5c4982ab417b9be =================
[state-dump] Infeasible queue length: 0
[state-dump] Schedule queue length: 0
[state-dump] Dispatch queue length: 0
[state-dump] num_waiting_for_resource: 0
[state-dump] num_waiting_for_plasma_memory: 0
[state-dump] num_waiting_for_remote_node_resources: 0
[state-dump] num_worker_not_started_by_job_config_not_exist: 0
[state-dump] num_worker_not_started_by_registration_timeout: 0
[state-dump] num_tasks_waiting_for_workers: 0
[state-dump] num_cancelled_tasks: 0
[state-dump] cluster_resource_scheduler state:
[state-dump] Local id: 5613091048481760916 Local resources: {"total":{node:__internal_head__: [10000], node:192.168.0.2: [10000], GPU: [10000, 10000], CPU: [200000], memory: [864744902660000], object_store_memory: [21474836480000], accelerator_type:A40: [10000]}}, "available": {node:__internal_head__: [10000], node:192.168.0.2: [10000], GPU: [10000, 10000], CPU: [200000], memory: [864744902660000], object_store_memory: [21474836480000], accelerator_type:A40: [10000]}}, "labels":{"ray.io/node_id":"594aea7169520c22e97f1719928454f2113460e3e5c4982ab417b9be",} is_draining: 0 is_idle: 1 Cluster resources: node id: 5613091048481760916{"total":{object_store_memory: 21474836480000, memory: 864744902660000, CPU: 200000, accelerator_type:A40: 10000, node:__internal_head__: 10000, node:192.168.0.2: 10000, GPU: 20000}}, "available": {object_store_memory: 21474836480000, memory: 864744902660000, CPU: 200000, accelerator_type:A40: 10000, node:__internal_head__: 10000, node:192.168.0.2: 10000, GPU: 20000}}, "labels":{"ray.io/node_id":"594aea7169520c22e97f1719928454f2113460e3e5c4982ab417b9be",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []}
[state-dump] Waiting tasks size: 0
[state-dump] Number of executing tasks: 0
[state-dump] Number of pinned task arguments: 0
[state-dump] Number of total spilled tasks: 0
[state-dump] Number of spilled waiting tasks: 0
[state-dump] Number of spilled unschedulable tasks: 0
[state-dump] Resource usage {
[state-dump] }
[state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}:
[state-dump]
[state-dump] Running tasks by scheduling class:
[state-dump] ==================================================
[state-dump]
[state-dump] ClusterResources:
[state-dump] LocalObjectManager:
[state-dump] - num pinned objects: 0
[state-dump] - pinned objects size: 0
[state-dump] - num objects pending restore: 0
[state-dump] - num objects pending spill: 0
[state-dump] - num bytes pending spill: 0
[state-dump] - num bytes currently spilled: 0
[state-dump] - cumulative spill requests: 0
[state-dump] - cumulative restore requests: 0
[state-dump] - spilled objects pending delete: 0
[state-dump]
[state-dump] ObjectManager:
[state-dump] - num local objects: 0
[state-dump] - num unfulfilled push requests: 0
[state-dump] - num object pull requests: 0
[state-dump] - num chunks received total: 0
[state-dump] - num chunks received failed (all): 0
[state-dump] - num chunks received failed / cancelled: 0
[state-dump] - num chunks received failed / plasma error: 0
[state-dump] Event stats:
[state-dump] Global stats: 0 total (0 active)
[state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] Execution time: mean = -nan s, total = 0.000 s
[state-dump] Event stats:
[state-dump] PushManager:
[state-dump] - num pushes in flight: 0
[state-dump] - num chunks in flight: 0
[state-dump] - num chunks remaining: 0
[state-dump] - max chunks allowed: 409
[state-dump] OwnershipBasedObjectDirectory:
[state-dump] - num listeners: 0
[state-dump] - cumulative location updates: 0
[state-dump] - num location updates per second: 69998594105052000.000
[state-dump] - num location lookups per second: 69998594105040000.000
[state-dump] - num locations added per second: 0.000
[state-dump] - num locations removed per second: 0.000
[state-dump] BufferPool:
[state-dump] - create buffer state map size: 0
[state-dump] PullManager:
[state-dump] - num bytes available for pulled objects: 2147483648
[state-dump] - num bytes being pulled (all): 0
[state-dump] - num bytes being pulled / pinned: 0
[state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable}
[state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable}
[state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable}
[state-dump] - first get request bundle: N/A
[state-dump] - first wait request bundle: N/A
[state-dump] - first task request bundle: N/A
[state-dump] - num objects queued: 0
[state-dump] - num objects actively pulled (all): 0
[state-dump] - num objects actively pulled / pinned: 0
[state-dump] - num bundles being pulled: 0
[state-dump] - num pull retries: 0
[state-dump] - max timeout seconds: 0
[state-dump] - max timeout request is already processed. No entry.
[state-dump]
[state-dump] WorkerPool:
[state-dump] - registered jobs: 0
[state-dump] - process_failed_job_config_missing: 0
[state-dump] - process_failed_rate_limited: 0
[state-dump] - process_failed_pending_registration: 0
[state-dump] - process_failed_runtime_env_setup_failed: 0
[state-dump] - num PYTHON workers: 0
[state-dump] - num PYTHON drivers: 0
[state-dump] - num PYTHON pending start requests: 0
[state-dump] - num PYTHON pending registration requests: 0
[state-dump] - num object spill callbacks queued: 0
[state-dump] - num object restore queued: 0
[state-dump] - num util functions queued: 0
[state-dump] - num idle workers: 0
[state-dump] TaskDependencyManager:
[state-dump] - task deps map size: 0
[state-dump] - get req map size: 0
[state-dump] - wait req map size: 0
[state-dump] - local objects map size: 0
[state-dump] WaitManager:
[state-dump] - num active wait requests: 0
[state-dump] Subscriber:
[state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL
[state-dump] - cumulative subscribe requests: 0
[state-dump] - cumulative unsubscribe requests: 0
[state-dump] - active subscribed publishers: 0
[state-dump] - cumulative published messages: 0
[state-dump] - cumulative processed messages: 0
[state-dump] Channel WORKER_OBJECT_EVICTION
[state-dump] - cumulative subscribe requests: 0
[state-dump] - cumulative unsubscribe requests: 0
[state-dump] - active subscribed publishers: 0
[state-dump] - cumulative published messages: 0
[state-dump] - cumulative processed messages: 0
[state-dump] Channel WORKER_REF_REMOVED_CHANNEL
[state-dump] - cumulative subscribe requests: 0
[state-dump] - cumulative unsubscribe requests: 0
[state-dump] - active subscribed publishers: 0
[state-dump] - cumulative published messages: 0
[state-dump] - cumulative processed messages: 0
[state-dump] num async plasma notifications: 0
[state-dump] Remote node managers:
[state-dump] Event stats:
[state-dump] Global stats: 28 total (13 active)
[state-dump] Queueing time: mean = 1.467 ms, max = 11.475 ms, min = 28.733 us, total = 41.081 ms
[state-dump] Execution time: mean = 36.782 ms, total = 1.030 s
[state-dump] Event stats:
[state-dump] PeriodicalRunner.RunFnPeriodically - 11 total (2 active, 1 running), Execution time: mean = 170.919 us, total = 1.880 ms, Queueing time: mean = 3.701 ms, max = 11.475 ms, min = 28.733 us, total = 40.710 ms
[state-dump] NodeManager.ScheduleAndDispatchTasks - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.723 ms, total = 1.723 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] NodeManager.deadline_timer.flush_free_objects - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.309 ms, total = 2.309 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] NodeManager.deadline_timer.debug_state_dump - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] NodeManager.deadline_timer.record_metrics - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] ObjectManager.UpdateAvailableMemory - 1 total (0 active), Execution time: mean = 4.329 us, total = 4.329 us, Queueing time: mean = 140.059 us, max = 140.059 us, min = 140.059 us, total = 140.059 us
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 109.142 us, max = 109.142 us, min = 109.142 us, total = 109.142 us
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 1 total (0 active), Execution time: mean = 1.589 ms, total = 1.589 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 247.102 us, total = 247.102 us, Queueing time: mean = 121.669 us, max = 121.669 us, min = 121.669 us, total = 121.669 us
[state-dump] ClusterResourceManager.ResetRemoteNodeView - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] DebugString() time ms: 0
[state-dump]
[state-dump]
[2025-01-15 18:15:46,545 I 517589 517589] (raylet) accessor.cc:762: Received notification for node, IsAlive = 1 node_id=594aea7169520c22e97f1719928454f2113460e3e5c4982ab417b9be
[2025-01-15 18:15:46,607 I 517589 517589] (raylet) worker_pool.cc:501: Started worker process with pid 517721, the token is 0
[2025-01-15 18:15:46,611 I 517589 517589] (raylet) worker_pool.cc:501: Started worker process with pid 517722, the token is 1
[2025-01-15 18:15:46,613 I 517589 517589] (raylet) worker_pool.cc:501: Started worker process with pid 517723, the token is 2
[2025-01-15 18:15:46,615 I 517589 517589] (raylet) worker_pool.cc:501: Started worker process with pid 517724, the token is 3
[2025-01-15 18:15:46,617 I 517589 517589] (raylet) worker_pool.cc:501: Started worker process with pid 517725, the token is 4
[2025-01-15 18:15:46,619 I 517589 517589] (raylet) worker_pool.cc:501: Started worker process with pid 517726, the token is 5
[2025-01-15 18:15:46,622 I 517589 517589] (raylet) worker_pool.cc:501: Started worker process with pid 517727, the token is 6
[2025-01-15 18:15:46,624 I 517589 517589] (raylet) worker_pool.cc:501: Started worker process with pid 517728, the token is 7
[2025-01-15 18:15:46,626 I 517589 517589] (raylet) worker_pool.cc:501: Started worker process with pid 517729, the token is 8
[2025-01-15 18:15:46,629 I 517589 517589] (raylet) worker_pool.cc:501: Started worker process with pid 517730, the token is 9
[2025-01-15 18:15:46,632 I 517589 517589] (raylet) worker_pool.cc:501: Started worker process with pid 517731, the token is 10
[2025-01-15 18:15:46,634 I 517589 517589] (raylet) worker_pool.cc:501: Started worker process with pid 517732, the token is 11
[2025-01-15 18:15:46,636 I 517589 517589] (raylet) worker_pool.cc:501: Started worker process with pid 517733, the token is 12
[2025-01-15 18:15:46,638 I 517589 517589] (raylet) worker_pool.cc:501: Started worker process with pid 517734, the token is 13
[2025-01-15 18:15:46,640 I 517589 517589] (raylet) worker_pool.cc:501: Started worker process with pid 517735, the token is 14
[2025-01-15 18:15:46,642 I 517589 517589] (raylet) worker_pool.cc:501: Started worker process with pid 517736, the token is 15
[2025-01-15 18:15:46,645 I 517589 517589] (raylet) worker_pool.cc:501: Started worker process with pid 517737, the token is 16
[2025-01-15 18:15:46,648 I 517589 517589] (raylet) worker_pool.cc:501: Started worker process with pid 517738, the token is 17
[2025-01-15 18:15:46,650 I 517589 517589] (raylet) worker_pool.cc:501: Started worker process with pid 517739, the token is 18
[2025-01-15 18:15:46,652 I 517589 517589] (raylet) worker_pool.cc:501: Started worker process with pid 517740, the token is 19
[2025-01-15 18:15:47,355 I 517589 517618] (raylet) object_store.cc:35: Object store current usage 8e-09 / 2.14748 GB.
[2025-01-15 18:15:47,464 I 517589 517589] (raylet) worker_pool.cc:692: Job 01000000 already started in worker pool.
[2025-01-15 18:15:48,253 I 517589 517589] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=3, has creation task exception = false
[2025-01-15 18:15:48,579 I 517589 517589] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false
[2025-01-15 18:15:48,579 I 517589 517589] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false
[2025-01-15 18:15:48,580 I 517589 517589] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false
[2025-01-15 18:15:48,580 I 517589 517589] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false
[2025-01-15 18:15:48,580 I 517589 517589] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false
[2025-01-15 18:15:48,581 I 517589 517589] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false
[2025-01-15 18:15:48,581 I 517589 517589] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false
[2025-01-15 18:15:48,581 I 517589 517589] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false
[2025-01-15 18:15:48,581 I 517589 517589] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false
[2025-01-15 18:15:48,587 I 517589 517589] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false
[2025-01-15 18:15:48,589 I 517589 517589] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false
[2025-01-15 18:15:48,590 I 517589 517589] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false
[2025-01-15 18:15:48,591 I 517589 517589] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false
[2025-01-15 18:15:48,591 I 517589 517589] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false
[2025-01-15 18:15:48,591 I 517589 517589] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false
[2025-01-15 18:15:48,592 I 517589 517589] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false
[2025-01-15 18:15:48,592 I 517589 517589] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false
[2025-01-15 18:15:48,593 I 517589 517589] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false
[2025-01-15 18:15:48,956 I 517589 517589] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=3, has creation task exception = false
[2025-01-15 18:15:48,964 I 517589 517589] (raylet) worker_pool.cc:501: Started worker process with pid 519419, the token is 20
[2025-01-15 18:15:49,498 I 517589 517589] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=3, has creation task exception = false
[2025-01-15 18:15:49,499 I 517589 517589] (raylet) node_manager.cc:1586: Driver (pid=517323) is disconnected. worker_id=01000000ffffffffffffffffffffffffffffffffffffffffffffffff job_id=01000000
[2025-01-15 18:15:49,501 I 517589 517589] (raylet) node_manager.cc:1111: The leased worker 755a376c5f34c660099a786e9c0a24496c4ff184dab5588ae567219d is killed because the owner process 01000000ffffffffffffffffffffffffffffffffffffffffffffffff died.
[2025-01-15 18:15:49,503 I 517589 517589] (raylet) worker_pool.cc:692: Job 01000000 already started in worker pool.
[2025-01-15 18:15:49,503 I 517589 517589] (raylet) node_manager.cc:633: The leased worker is killed because the job 01000000 finished. worker_id=755a376c5f34c660099a786e9c0a24496c4ff184dab5588ae567219d
[2025-01-15 18:15:49,512 I 517589 517589] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false
[2025-01-15 18:15:49,708 I 517589 517589] (raylet) main.cc:454: received SIGTERM. Existing local drain request = None
[2025-01-15 18:15:49,708 I 517589 517589] (raylet) main.cc:255: Raylet graceful shutdown triggered, reason = EXPECTED_TERMINATION, reason message = received SIGTERM
[2025-01-15 18:15:49,708 I 517589 517589] (raylet) main.cc:258: Shutting down...
[2025-01-15 18:15:49,708 I 517589 517589] (raylet) accessor.cc:510: Unregistering node node_id=594aea7169520c22e97f1719928454f2113460e3e5c4982ab417b9be
[2025-01-15 18:15:49,711 I 517589 517589] (raylet) accessor.cc:523: Finished unregistering node info, status = OK node_id=594aea7169520c22e97f1719928454f2113460e3e5c4982ab417b9be
[2025-01-15 18:15:49,715 I 517589 517589] (raylet) agent_manager.cc:112: Killing agent dashboard_agent/424238335, pid 517681.
[2025-01-15 18:15:49,727 I 517589 517682] (raylet) agent_manager.cc:79: Agent process with name dashboard_agent/424238335 exited, exit code 0.
[2025-01-15 18:15:49,728 I 517589 517589] (raylet) agent_manager.cc:112: Killing agent runtime_env_agent, pid 517683.
[2025-01-15 18:15:49,736 I 517589 517684] (raylet) agent_manager.cc:79: Agent process with name runtime_env_agent exited, exit code 0.
[2025-01-15 18:15:49,737 I 517589 517589] (raylet) io_service_pool.cc:47: IOServicePool is stopped.
[2025-01-15 18:15:49,850 I 517589 517589] (raylet) stats.h:120: Stats module has shutdown.
|