|
[2025-01-20 22:49:27,472 I 10411 10411] (raylet) main.cc:180: Setting cluster ID to: 3a66288956e3f6c454620452bef2327167564ea3b3c92783b86c76ba |
|
[2025-01-20 22:49:27,482 I 10411 10411] (raylet) main.cc:289: Raylet is not set to kill unknown children. |
|
[2025-01-20 22:49:27,482 I 10411 10411] (raylet) io_service_pool.cc:35: IOServicePool is running with 1 io_service. |
|
[2025-01-20 22:49:27,482 I 10411 10411] (raylet) main.cc:419: Setting node ID node_id=436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3 |
|
[2025-01-20 22:49:27,482 I 10411 10411] (raylet) store_runner.cc:32: Allowing the Plasma store to use up to 28.8967GB of memory. |
|
[2025-01-20 22:49:27,482 I 10411 10411] (raylet) store_runner.cc:48: Starting object store with directory /dev/shm, fallback /tmp/ray, and huge page support disabled |
|
[2025-01-20 22:49:27,483 I 10411 10439] (raylet) dlmalloc.cc:154: create_and_mmap_buffer(28896788488, /dev/shm/plasmaXXXXXX) |
|
[2025-01-20 22:49:27,484 I 10411 10439] (raylet) store.cc:564: Plasma store debug dump: |
|
Current usage: 0 / 28.8967 GB |
|
- num bytes created total: 0 |
|
0 pending objects of total size 0MB |
|
- objects spillable: 0 |
|
- bytes spillable: 0 |
|
- objects unsealed: 0 |
|
- bytes unsealed: 0 |
|
- objects in use: 0 |
|
- bytes in use: 0 |
|
- objects evictable: 0 |
|
- bytes evictable: 0 |
|
|
|
- objects created by worker: 0 |
|
- bytes created by worker: 0 |
|
- objects restored: 0 |
|
- bytes restored: 0 |
|
- objects received: 0 |
|
- bytes received: 0 |
|
- objects errored: 0 |
|
- bytes errored: 0 |
|
|
|
[2025-01-20 22:49:28,487 I 10411 10411] (raylet) grpc_server.cc:134: ObjectManager server started, listening on port 37789. |
|
[2025-01-20 22:49:28,489 I 10411 10411] (raylet) worker_killing_policy.cc:101: Running GroupByOwner policy. |
|
[2025-01-20 22:49:28,490 I 10411 10411] (raylet) memory_monitor.cc:47: MemoryMonitor initialized with usage threshold at 94999994368 bytes (0.95 system memory), total system memory bytes: 99999997952 |
|
[2025-01-20 22:49:28,490 I 10411 10411] (raylet) node_manager.cc:287: Initializing NodeManager node_id=436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3 |
|
[2025-01-20 22:49:28,491 I 10411 10411] (raylet) grpc_server.cc:134: NodeManager server started, listening on port 33311. |
|
[2025-01-20 22:49:28,501 I 10411 10501] (raylet) agent_manager.cc:77: Monitor agent process with name dashboard_agent/424238335 |
|
[2025-01-20 22:49:28,501 I 10411 10503] (raylet) agent_manager.cc:77: Monitor agent process with name runtime_env_agent |
|
[2025-01-20 22:49:28,502 I 10411 10411] (raylet) event.cc:493: Ray Event initialized for RAYLET |
|
[2025-01-20 22:49:28,502 I 10411 10411] (raylet) event.cc:324: Set ray event level to warning |
|
[2025-01-20 22:49:28,504 I 10411 10411] (raylet) raylet.cc:134: Raylet of id, 436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3 started. Raylet consists of node_manager and object_manager. node_manager address: 192.168.0.2:33311 object_manager address: 192.168.0.2:37789 hostname: 0cd925b1f73b |
|
[2025-01-20 22:49:28,506 I 10411 10411] (raylet) node_manager.cc:525: [state-dump] NodeManager: |
|
[state-dump] Node ID: 436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3 |
|
[state-dump] Node name: 192.168.0.2 |
|
[state-dump] InitialConfigResources: {node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 577933393920000, object_store_memory: 288966696960000, CPU: 160000, GPU: 20000} |
|
[state-dump] ClusterTaskManager: |
|
[state-dump] ========== Node: 436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3 ================= |
|
[state-dump] Infeasible queue length: 0 |
|
[state-dump] Schedule queue length: 0 |
|
[state-dump] Dispatch queue length: 0 |
|
[state-dump] num_waiting_for_resource: 0 |
|
[state-dump] num_waiting_for_plasma_memory: 0 |
|
[state-dump] num_waiting_for_remote_node_resources: 0 |
|
[state-dump] num_worker_not_started_by_job_config_not_exist: 0 |
|
[state-dump] num_worker_not_started_by_registration_timeout: 0 |
|
[state-dump] num_tasks_waiting_for_workers: 0 |
|
[state-dump] num_cancelled_tasks: 0 |
|
[state-dump] cluster_resource_scheduler state: |
|
[state-dump] Local id: 5156797141345256205 Local resources: {"total":{GPU: [10000, 10000], CPU: [160000], object_store_memory: [288966696960000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], node:__internal_head__: [10000], memory: [577933393920000]}}, "available": {GPU: [10000, 10000], CPU: [160000], object_store_memory: [288966696960000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], node:__internal_head__: [10000], memory: [577933393920000]}}, "labels":{"ray.io/node_id":"436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3",} is_draining: 0 is_idle: 1 Cluster resources: node id: 5156797141345256205{"total":{CPU: 160000, memory: 577933393920000, object_store_memory: 288966696960000, accelerator_type:A40: 10000, node:192.168.0.2: 10000, node:__internal_head__: 10000, GPU: 20000}}, "available": {object_store_memory: 288966696960000, memory: 577933393920000, CPU: 160000, accelerator_type:A40: 10000, GPU: 20000, node:192.168.0.2: 10000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} |
|
[state-dump] Waiting tasks size: 0 |
|
[state-dump] Number of executing tasks: 0 |
|
[state-dump] Number of pinned task arguments: 0 |
|
[state-dump] Number of total spilled tasks: 0 |
|
[state-dump] Number of spilled waiting tasks: 0 |
|
[state-dump] Number of spilled unschedulable tasks: 0 |
|
[state-dump] Resource usage { |
|
[state-dump] } |
|
[state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: |
|
[state-dump] |
|
[state-dump] Running tasks by scheduling class: |
|
[state-dump] ================================================== |
|
[state-dump] |
|
[state-dump] ClusterResources: |
|
[state-dump] LocalObjectManager: |
|
[state-dump] - num pinned objects: 0 |
|
[state-dump] - pinned objects size: 0 |
|
[state-dump] - num objects pending restore: 0 |
|
[state-dump] - num objects pending spill: 0 |
|
[state-dump] - num bytes pending spill: 0 |
|
[state-dump] - num bytes currently spilled: 0 |
|
[state-dump] - cumulative spill requests: 0 |
|
[state-dump] - cumulative restore requests: 0 |
|
[state-dump] - spilled objects pending delete: 0 |
|
[state-dump] |
|
[state-dump] ObjectManager: |
|
[state-dump] - num local objects: 0 |
|
[state-dump] - num unfulfilled push requests: 0 |
|
[state-dump] - num object pull requests: 0 |
|
[state-dump] - num chunks received total: 0 |
|
[state-dump] - num chunks received failed (all): 0 |
|
[state-dump] - num chunks received failed / cancelled: 0 |
|
[state-dump] - num chunks received failed / plasma error: 0 |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 0 total (0 active) |
|
[state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] Execution time: mean = -nan s, total = 0.000 s |
|
[state-dump] Event stats: |
|
[state-dump] PushManager: |
|
[state-dump] - num pushes in flight: 0 |
|
[state-dump] - num chunks in flight: 0 |
|
[state-dump] - num chunks remaining: 0 |
|
[state-dump] - max chunks allowed: 409 |
|
[state-dump] OwnershipBasedObjectDirectory: |
|
[state-dump] - num listeners: 0 |
|
[state-dump] - cumulative location updates: 0 |
|
[state-dump] - num location updates per second: 69931878457052000.000 |
|
[state-dump] - num location lookups per second: 69931878457040000.000 |
|
[state-dump] - num locations added per second: 0.000 |
|
[state-dump] - num locations removed per second: 0.000 |
|
[state-dump] BufferPool: |
|
[state-dump] - create buffer state map size: 0 |
|
[state-dump] PullManager: |
|
[state-dump] - num bytes available for pulled objects: 28896669696 |
|
[state-dump] - num bytes being pulled (all): 0 |
|
[state-dump] - num bytes being pulled / pinned: 0 |
|
[state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - first get request bundle: N/A |
|
[state-dump] - first wait request bundle: N/A |
|
[state-dump] - first task request bundle: N/A |
|
[state-dump] - num objects queued: 0 |
|
[state-dump] - num objects actively pulled (all): 0 |
|
[state-dump] - num objects actively pulled / pinned: 0 |
|
[state-dump] - num bundles being pulled: 0 |
|
[state-dump] - num pull retries: 0 |
|
[state-dump] - max timeout seconds: 0 |
|
[state-dump] - max timeout request is already processed. No entry. |
|
[state-dump] |
|
[state-dump] WorkerPool: |
|
[state-dump] - registered jobs: 0 |
|
[state-dump] - process_failed_job_config_missing: 0 |
|
[state-dump] - process_failed_rate_limited: 0 |
|
[state-dump] - process_failed_pending_registration: 0 |
|
[state-dump] - process_failed_runtime_env_setup_failed: 0 |
|
[state-dump] - num PYTHON workers: 0 |
|
[state-dump] - num PYTHON drivers: 0 |
|
[state-dump] - num PYTHON pending start requests: 0 |
|
[state-dump] - num PYTHON pending registration requests: 0 |
|
[state-dump] - num object spill callbacks queued: 0 |
|
[state-dump] - num object restore queued: 0 |
|
[state-dump] - num util functions queued: 0 |
|
[state-dump] - num idle workers: 0 |
|
[state-dump] TaskDependencyManager: |
|
[state-dump] - task deps map size: 0 |
|
[state-dump] - get req map size: 0 |
|
[state-dump] - wait req map size: 0 |
|
[state-dump] - local objects map size: 0 |
|
[state-dump] WaitManager: |
|
[state-dump] - num active wait requests: 0 |
|
[state-dump] Subscriber: |
|
[state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_OBJECT_EVICTION |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_REF_REMOVED_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] num async plasma notifications: 0 |
|
[state-dump] Remote node managers: |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 28 total (13 active) |
|
[state-dump] Queueing time: mean = 1.568 ms, max = 12.297 ms, min = 61.360 us, total = 43.909 ms |
|
[state-dump] Execution time: mean = 36.726 ms, total = 1.028 s |
|
[state-dump] Event stats: |
|
[state-dump] PeriodicalRunner.RunFnPeriodically - 11 total (2 active, 1 running), Execution time: mean = 166.818 us, total = 1.835 ms, Queueing time: mean = 3.951 ms, max = 12.297 ms, min = 61.360 us, total = 43.463 ms |
|
[state-dump] NodeManager.deadline_timer.record_metrics - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.ScheduleAndDispatchTasks - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.021 s, total = 1.021 s, Queueing time: mean = 115.896 us, max = 115.896 us, min = 115.896 us, total = 115.896 us |
|
[state-dump] ObjectManager.UpdateAvailableMemory - 1 total (0 active), Execution time: mean = 6.029 us, total = 6.029 us, Queueing time: mean = 210.302 us, max = 210.302 us, min = 210.302 us, total = 210.302 us |
|
[state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.deadline_timer.flush_free_objects - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.deadline_timer.debug_state_dump - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 1 total (0 active), Execution time: mean = 1.603 ms, total = 1.603 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 633.811 us, total = 633.811 us, Queueing time: mean = 119.913 us, max = 119.913 us, min = 119.913 us, total = 119.913 us |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.687 ms, total = 1.687 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.936 ms, total = 1.936 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ClusterResourceManager.ResetRemoteNodeView - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] DebugString() time ms: 0 |
|
[state-dump] |
|
[state-dump] |
|
[2025-01-20 22:49:28,508 I 10411 10411] (raylet) accessor.cc:762: Received notification for node, IsAlive = 1 node_id=436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3 |
|
[2025-01-20 22:49:28,582 I 10411 10411] (raylet) worker_pool.cc:501: Started worker process with pid 10539, the token is 0 |
|
[2025-01-20 22:49:28,586 I 10411 10411] (raylet) worker_pool.cc:501: Started worker process with pid 10540, the token is 1 |
|
[2025-01-20 22:49:28,590 I 10411 10411] (raylet) worker_pool.cc:501: Started worker process with pid 10541, the token is 2 |
|
[2025-01-20 22:49:28,592 I 10411 10411] (raylet) worker_pool.cc:501: Started worker process with pid 10542, the token is 3 |
|
[2025-01-20 22:49:28,594 I 10411 10411] (raylet) worker_pool.cc:501: Started worker process with pid 10543, the token is 4 |
|
[2025-01-20 22:49:28,596 I 10411 10411] (raylet) worker_pool.cc:501: Started worker process with pid 10544, the token is 5 |
|
[2025-01-20 22:49:28,598 I 10411 10411] (raylet) worker_pool.cc:501: Started worker process with pid 10545, the token is 6 |
|
[2025-01-20 22:49:28,600 I 10411 10411] (raylet) worker_pool.cc:501: Started worker process with pid 10546, the token is 7 |
|
[2025-01-20 22:49:28,602 I 10411 10411] (raylet) worker_pool.cc:501: Started worker process with pid 10547, the token is 8 |
|
[2025-01-20 22:49:28,604 I 10411 10411] (raylet) worker_pool.cc:501: Started worker process with pid 10548, the token is 9 |
|
[2025-01-20 22:49:28,606 I 10411 10411] (raylet) worker_pool.cc:501: Started worker process with pid 10549, the token is 10 |
|
[2025-01-20 22:49:28,608 I 10411 10411] (raylet) worker_pool.cc:501: Started worker process with pid 10550, the token is 11 |
|
[2025-01-20 22:49:28,610 I 10411 10411] (raylet) worker_pool.cc:501: Started worker process with pid 10551, the token is 12 |
|
[2025-01-20 22:49:28,612 I 10411 10411] (raylet) worker_pool.cc:501: Started worker process with pid 10552, the token is 13 |
|
[2025-01-20 22:49:28,614 I 10411 10411] (raylet) worker_pool.cc:501: Started worker process with pid 10553, the token is 14 |
|
[2025-01-20 22:49:28,616 I 10411 10411] (raylet) worker_pool.cc:501: Started worker process with pid 10554, the token is 15 |
|
[2025-01-20 22:49:29,219 I 10411 10439] (raylet) object_store.cc:35: Object store current usage 8e-09 / 28.8967 GB. |
|
[2025-01-20 22:49:29,296 I 10411 10411] (raylet) worker_pool.cc:692: Job 01000000 already started in worker pool. |
|
[2025-01-20 22:49:37,498 W 10411 10433] (raylet) metric_exporter.cc:105: [1] Export metrics to agent failed: RpcError: RPC Error message: failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:53711: Failed to connect to remote host: Connection refused; RPC Error details: . This won't affect Ray, but you can lose metrics from the cluster. |
|
[2025-01-20 22:50:27,485 I 10411 10439] (raylet) store.cc:564: Plasma store debug dump: |
|
Current usage: 0 / 28.8967 GB |
|
- num bytes created total: 136 |
|
0 pending objects of total size 0MB |
|
- objects spillable: 0 |
|
- bytes spillable: 0 |
|
- objects unsealed: 0 |
|
- bytes unsealed: 0 |
|
- objects in use: 0 |
|
- bytes in use: 0 |
|
- objects evictable: 0 |
|
- bytes evictable: 0 |
|
|
|
- objects created by worker: 0 |
|
- bytes created by worker: 0 |
|
- objects restored: 0 |
|
- bytes restored: 0 |
|
- objects received: 0 |
|
- bytes received: 0 |
|
- objects errored: 0 |
|
- bytes errored: 0 |
|
|
|
[2025-01-20 22:50:28,509 I 10411 10411] (raylet) node_manager.cc:525: [state-dump] NodeManager: |
|
[state-dump] Node ID: 436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3 |
|
[state-dump] Node name: 192.168.0.2 |
|
[state-dump] InitialConfigResources: {node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 577933393920000, object_store_memory: 288966696960000, CPU: 160000, GPU: 20000} |
|
[state-dump] ClusterTaskManager: |
|
[state-dump] ========== Node: 436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3 ================= |
|
[state-dump] Infeasible queue length: 0 |
|
[state-dump] Schedule queue length: 0 |
|
[state-dump] Dispatch queue length: 0 |
|
[state-dump] num_waiting_for_resource: 0 |
|
[state-dump] num_waiting_for_plasma_memory: 0 |
|
[state-dump] num_waiting_for_remote_node_resources: 0 |
|
[state-dump] num_worker_not_started_by_job_config_not_exist: 0 |
|
[state-dump] num_worker_not_started_by_registration_timeout: 0 |
|
[state-dump] num_tasks_waiting_for_workers: 0 |
|
[state-dump] num_cancelled_tasks: 0 |
|
[state-dump] cluster_resource_scheduler state: |
|
[state-dump] Local id: 5156797141345256205 Local resources: {"total":{GPU: [10000, 10000], CPU: [160000], object_store_memory: [288966696960000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], node:__internal_head__: [10000], memory: [577933393920000]}}, "available": {GPU: [10000, 10000], CPU: [160000], object_store_memory: [288966696960000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], node:__internal_head__: [10000], memory: [577933393920000]}}, "labels":{"ray.io/node_id":"436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3",} is_draining: 0 is_idle: 1 Cluster resources: node id: 5156797141345256205{"total":{node:192.168.0.2: 10000, accelerator_type:A40: 10000, memory: 577933393920000, node:__internal_head__: 10000, GPU: 20000, CPU: 160000, object_store_memory: 288966696960000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, memory: 577933393920000, GPU: 20000, node:__internal_head__: 10000, CPU: 160000, object_store_memory: 288966696960000}}, "labels":{"ray.io/node_id":"436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} |
|
[state-dump] Waiting tasks size: 0 |
|
[state-dump] Number of executing tasks: 0 |
|
[state-dump] Number of pinned task arguments: 0 |
|
[state-dump] Number of total spilled tasks: 0 |
|
[state-dump] Number of spilled waiting tasks: 0 |
|
[state-dump] Number of spilled unschedulable tasks: 0 |
|
[state-dump] Resource usage { |
|
[state-dump] } |
|
[state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: |
|
[state-dump] |
|
[state-dump] Running tasks by scheduling class: |
|
[state-dump] ================================================== |
|
[state-dump] |
|
[state-dump] ClusterResources: |
|
[state-dump] LocalObjectManager: |
|
[state-dump] - num pinned objects: 0 |
|
[state-dump] - pinned objects size: 0 |
|
[state-dump] - num objects pending restore: 0 |
|
[state-dump] - num objects pending spill: 0 |
|
[state-dump] - num bytes pending spill: 0 |
|
[state-dump] - num bytes currently spilled: 0 |
|
[state-dump] - cumulative spill requests: 0 |
|
[state-dump] - cumulative restore requests: 0 |
|
[state-dump] - spilled objects pending delete: 0 |
|
[state-dump] |
|
[state-dump] ObjectManager: |
|
[state-dump] - num local objects: 0 |
|
[state-dump] - num unfulfilled push requests: 0 |
|
[state-dump] - num object pull requests: 0 |
|
[state-dump] - num chunks received total: 0 |
|
[state-dump] - num chunks received failed (all): 0 |
|
[state-dump] - num chunks received failed / cancelled: 0 |
|
[state-dump] - num chunks received failed / plasma error: 0 |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 0 total (0 active) |
|
[state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] Execution time: mean = -nan s, total = 0.000 s |
|
[state-dump] Event stats: |
|
[state-dump] PushManager: |
|
[state-dump] - num pushes in flight: 0 |
|
[state-dump] - num chunks in flight: 0 |
|
[state-dump] - num chunks remaining: 0 |
|
[state-dump] - max chunks allowed: 409 |
|
[state-dump] OwnershipBasedObjectDirectory: |
|
[state-dump] - num listeners: 0 |
|
[state-dump] - cumulative location updates: 0 |
|
[state-dump] - num location updates per second: 0.000 |
|
[state-dump] - num location lookups per second: 0.000 |
|
[state-dump] - num locations added per second: 0.000 |
|
[state-dump] - num locations removed per second: 0.000 |
|
[state-dump] BufferPool: |
|
[state-dump] - create buffer state map size: 0 |
|
[state-dump] PullManager: |
|
[state-dump] - num bytes available for pulled objects: 28896669696 |
|
[state-dump] - num bytes being pulled (all): 0 |
|
[state-dump] - num bytes being pulled / pinned: 0 |
|
[state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - first get request bundle: N/A |
|
[state-dump] - first wait request bundle: N/A |
|
[state-dump] - first task request bundle: N/A |
|
[state-dump] - num objects queued: 0 |
|
[state-dump] - num objects actively pulled (all): 0 |
|
[state-dump] - num objects actively pulled / pinned: 0 |
|
[state-dump] - num bundles being pulled: 0 |
|
[state-dump] - num pull retries: 0 |
|
[state-dump] - max timeout seconds: 0 |
|
[state-dump] - max timeout request is already processed. No entry. |
|
[state-dump] |
|
[state-dump] WorkerPool: |
|
[state-dump] - registered jobs: 1 |
|
[state-dump] - process_failed_job_config_missing: 0 |
|
[state-dump] - process_failed_rate_limited: 0 |
|
[state-dump] - process_failed_pending_registration: 0 |
|
[state-dump] - process_failed_runtime_env_setup_failed: 0 |
|
[state-dump] - num PYTHON workers: 16 |
|
[state-dump] - num PYTHON drivers: 1 |
|
[state-dump] - num PYTHON pending start requests: 0 |
|
[state-dump] - num PYTHON pending registration requests: 0 |
|
[state-dump] - num object spill callbacks queued: 0 |
|
[state-dump] - num object restore queued: 0 |
|
[state-dump] - num util functions queued: 0 |
|
[state-dump] - num idle workers: 16 |
|
[state-dump] TaskDependencyManager: |
|
[state-dump] - task deps map size: 0 |
|
[state-dump] - get req map size: 0 |
|
[state-dump] - wait req map size: 0 |
|
[state-dump] - local objects map size: 0 |
|
[state-dump] WaitManager: |
|
[state-dump] - num active wait requests: 0 |
|
[state-dump] Subscriber: |
|
[state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_OBJECT_EVICTION |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_REF_REMOVED_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] num async plasma notifications: 0 |
|
[state-dump] Remote node managers: |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 5028 total (31 active) |
|
[state-dump] Queueing time: mean = 1.462 ms, max = 4.119 s, min = 53.000 ns, total = 7.349 s |
|
[state-dump] Execution time: mean = 549.515 us, total = 2.763 s |
|
[state-dump] Event stats: |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 1020 total (0 active), Execution time: mean = 560.029 us, total = 571.230 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 1020 total (0 active), Execution time: mean = 38.866 us, total = 39.643 ms, Queueing time: mean = 113.960 us, max = 366.689 us, min = 11.237 us, total = 116.239 ms |
|
[state-dump] RaySyncer.OnDemandBroadcasting - 600 total (1 active), Execution time: mean = 11.053 us, total = 6.632 ms, Queueing time: mean = 107.344 us, max = 7.768 ms, min = 22.646 us, total = 64.407 ms |
|
[state-dump] NodeManager.CheckGC - 600 total (1 active), Execution time: mean = 2.985 us, total = 1.791 ms, Queueing time: mean = 114.428 us, max = 7.776 ms, min = 29.415 us, total = 68.657 ms |
|
[state-dump] ObjectManager.UpdateAvailableMemory - 600 total (0 active), Execution time: mean = 5.912 us, total = 3.547 ms, Queueing time: mean = 113.667 us, max = 448.428 us, min = 4.677 us, total = 68.200 ms |
|
[state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 300 total (1 active), Execution time: mean = 17.494 us, total = 5.248 ms, Queueing time: mean = 79.216 us, max = 1.318 ms, min = 17.876 us, total = 23.765 ms |
|
[state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 240 total (1 active), Execution time: mean = 457.322 us, total = 109.757 ms, Queueing time: mean = 78.263 us, max = 170.408 us, min = 13.597 us, total = 18.783 ms |
|
[state-dump] ClientConnection.async_read.ProcessMessageHeader - 73 total (17 active), Execution time: mean = 7.363 us, total = 537.491 us, Queueing time: mean = 93.740 ms, max = 4.119 s, min = 23.863 us, total = 6.843 s |
|
[state-dump] NodeManager.deadline_timer.flush_free_objects - 60 total (1 active), Execution time: mean = 8.193 us, total = 491.589 us, Queueing time: mean = 179.286 us, max = 1.521 ms, min = 13.525 us, total = 10.757 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 60 total (0 active), Execution time: mean = 109.755 us, total = 6.585 ms, Queueing time: mean = 113.466 us, max = 178.893 us, min = 39.267 us, total = 6.808 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad - 60 total (0 active), Execution time: mean = 671.602 us, total = 40.296 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.ScheduleAndDispatchTasks - 60 total (1 active), Execution time: mean = 15.913 us, total = 954.760 us, Queueing time: mean = 69.086 us, max = 113.413 us, min = 10.081 us, total = 4.145 ms |
|
[state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 60 total (1 active), Execution time: mean = 2.604 us, total = 156.214 us, Queueing time: mean = 183.488 us, max = 1.518 ms, min = 8.952 us, total = 11.009 ms |
|
[state-dump] ClientConnection.async_read.ProcessMessage - 56 total (0 active), Execution time: mean = 853.118 us, total = 47.775 ms, Queueing time: mean = 50.305 us, max = 330.349 us, min = 4.637 us, total = 2.817 ms |
|
[state-dump] ClusterResourceManager.ResetRemoteNodeView - 21 total (1 active), Execution time: mean = 9.251 us, total = 194.273 us, Queueing time: mean = 73.506 us, max = 138.971 us, min = 29.094 us, total = 1.544 ms |
|
[state-dump] ClientConnection.async_write.DoAsyncWrites - 18 total (0 active), Execution time: mean = 1.183 us, total = 21.290 us, Queueing time: mean = 58.972 us, max = 350.163 us, min = 11.754 us, total = 1.061 ms |
|
[state-dump] ObjectManager.ObjectAdded - 17 total (0 active), Execution time: mean = 10.989 us, total = 186.814 us, Queueing time: mean = 90.374 us, max = 201.039 us, min = 8.654 us, total = 1.536 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 17 total (0 active), Execution time: mean = 111.454 us, total = 1.895 ms, Queueing time: mean = 2.344 ms, max = 37.348 ms, min = 10.325 us, total = 39.845 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig - 17 total (0 active), Execution time: mean = 3.105 ms, total = 52.788 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ObjectManager.ObjectDeleted - 17 total (0 active), Execution time: mean = 19.136 us, total = 325.310 us, Queueing time: mean = 150.300 us, max = 363.207 us, min = 38.732 us, total = 2.555 ms |
|
[state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 193.429 us, total = 2.515 ms, Queueing time: mean = 3.658 ms, max = 12.297 ms, min = 61.360 us, total = 47.557 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 12 total (0 active), Execution time: mean = 53.323 us, total = 639.871 us, Queueing time: mean = 119.886 us, max = 165.718 us, min = 36.527 us, total = 1.439 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 12 total (0 active), Execution time: mean = 1.477 ms, total = 17.729 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.deadline_timer.record_metrics - 12 total (1 active), Execution time: mean = 547.230 us, total = 6.567 ms, Queueing time: mean = 327.793 us, max = 1.102 ms, min = 20.120 us, total = 3.934 ms |
|
[state-dump] NodeManager.GcsCheckAlive - 12 total (1 active), Execution time: mean = 263.085 us, total = 3.157 ms, Queueing time: mean = 585.817 us, max = 1.435 ms, min = 90.244 us, total = 7.030 ms |
|
[state-dump] NodeManager.deadline_timer.debug_state_dump - 6 total (1 active), Execution time: mean = 1.583 ms, total = 9.500 ms, Queueing time: mean = 50.713 us, max = 67.462 us, min = 52.669 us, total = 304.278 us |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 3 total (0 active), Execution time: mean = 831.908 us, total = 2.496 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 3 total (0 active), Execution time: mean = 172.808 us, total = 518.424 us, Queueing time: mean = 157.745 us, max = 249.875 us, min = 104.417 us, total = 473.236 us |
|
[state-dump] WorkerPool.PopWorkerCallback - 3 total (0 active), Execution time: mean = 37.328 us, total = 111.985 us, Queueing time: mean = 70.069 us, max = 108.129 us, min = 36.205 us, total = 210.206 us |
|
[state-dump] RaySyncer.BroadcastMessage - 3 total (0 active), Execution time: mean = 171.243 us, total = 513.728 us, Queueing time: mean = 495.000 ns, max = 795.000 ns, min = 64.000 ns, total = 1.485 us |
|
[state-dump] - 3 total (0 active), Execution time: mean = 1.041 us, total = 3.122 us, Queueing time: mean = 173.118 us, max = 225.649 us, min = 112.440 us, total = 519.354 us |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 3 total (0 active), Execution time: mean = 105.602 us, total = 316.805 us, Queueing time: mean = 94.684 us, max = 118.743 us, min = 66.960 us, total = 284.051 us |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker - 3 total (0 active), Execution time: mean = 618.798 us, total = 1.856 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 395.836 ms, total = 791.673 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.672 ms, total = 3.345 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.103 us, total = 4.205 us, Queueing time: mean = 207.500 ns, max = 362.000 ns, min = 53.000 ns, total = 415.000 ns |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 151.719 us, total = 303.438 us, Queueing time: mean = 634.301 us, max = 1.112 ms, min = 156.108 us, total = 1.269 ms |
|
[state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 65.944 us, total = 65.944 us, Queueing time: mean = 245.340 us, max = 245.340 us, min = 245.340 us, total = 245.340 us |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.936 ms, total = 1.936 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 250.770 us, total = 250.770 us, Queueing time: mean = 167.708 us, max = 167.708 us, min = 167.708 us, total = 167.708 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 110.445 us, total = 110.445 us, Queueing time: mean = 119.814 us, max = 119.814 us, min = 119.814 us, total = 119.814 us |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.687 ms, total = 1.687 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 25.894 us, total = 25.894 us, Queueing time: mean = 128.891 us, max = 128.891 us, min = 128.891 us, total = 128.891 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.896 ms, total = 1.896 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.021 s, total = 1.021 s, Queueing time: mean = 115.896 us, max = 115.896 us, min = 115.896 us, total = 115.896 us |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 213.138 us, total = 213.138 us, Queueing time: mean = 91.253 us, max = 91.253 us, min = 91.253 us, total = 91.253 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.171 ms, total = 1.171 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 633.811 us, total = 633.811 us, Queueing time: mean = 119.913 us, max = 119.913 us, min = 119.913 us, total = 119.913 us |
|
[state-dump] NodeManager.deadline_timer.print_event_loop_stats - 1 total (1 active, 1 running), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.350 ms, total = 1.350 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.637 ms, total = 1.637 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 57.772 us, total = 57.772 us, Queueing time: mean = 171.436 us, max = 171.436 us, min = 171.436 us, total = 171.436 us |
|
[state-dump] DebugString() time ms: 2 |
|
[state-dump] |
|
[state-dump] |
|
[2025-01-20 22:51:27,485 I 10411 10439] (raylet) store.cc:564: Plasma store debug dump: |
|
Current usage: 0 / 28.8967 GB |
|
- num bytes created total: 136 |
|
0 pending objects of total size 0MB |
|
- objects spillable: 0 |
|
- bytes spillable: 0 |
|
- objects unsealed: 0 |
|
- bytes unsealed: 0 |
|
- objects in use: 0 |
|
- bytes in use: 0 |
|
- objects evictable: 0 |
|
- bytes evictable: 0 |
|
|
|
- objects created by worker: 0 |
|
- bytes created by worker: 0 |
|
- objects restored: 0 |
|
- bytes restored: 0 |
|
- objects received: 0 |
|
- bytes received: 0 |
|
- objects errored: 0 |
|
- bytes errored: 0 |
|
|
|
[2025-01-20 22:51:28,513 I 10411 10411] (raylet) node_manager.cc:525: [state-dump] NodeManager: |
|
[state-dump] Node ID: 436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3 |
|
[state-dump] Node name: 192.168.0.2 |
|
[state-dump] InitialConfigResources: {node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 577933393920000, object_store_memory: 288966696960000, CPU: 160000, GPU: 20000} |
|
[state-dump] ClusterTaskManager: |
|
[state-dump] ========== Node: 436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3 ================= |
|
[state-dump] Infeasible queue length: 0 |
|
[state-dump] Schedule queue length: 0 |
|
[state-dump] Dispatch queue length: 0 |
|
[state-dump] num_waiting_for_resource: 0 |
|
[state-dump] num_waiting_for_plasma_memory: 0 |
|
[state-dump] num_waiting_for_remote_node_resources: 0 |
|
[state-dump] num_worker_not_started_by_job_config_not_exist: 0 |
|
[state-dump] num_worker_not_started_by_registration_timeout: 0 |
|
[state-dump] num_tasks_waiting_for_workers: 0 |
|
[state-dump] num_cancelled_tasks: 0 |
|
[state-dump] cluster_resource_scheduler state: |
|
[state-dump] Local id: 5156797141345256205 Local resources: {"total":{GPU: [10000, 10000], CPU: [160000], object_store_memory: [288966696960000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], node:__internal_head__: [10000], memory: [577933393920000]}}, "available": {GPU: [10000, 10000], CPU: [160000], object_store_memory: [288966696960000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], node:__internal_head__: [10000], memory: [577933393920000]}}, "labels":{"ray.io/node_id":"436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3",} is_draining: 0 is_idle: 1 Cluster resources: node id: 5156797141345256205{"total":{node:192.168.0.2: 10000, accelerator_type:A40: 10000, memory: 577933393920000, node:__internal_head__: 10000, GPU: 20000, CPU: 160000, object_store_memory: 288966696960000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, memory: 577933393920000, GPU: 20000, node:__internal_head__: 10000, CPU: 160000, object_store_memory: 288966696960000}}, "labels":{"ray.io/node_id":"436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} |
|
[state-dump] Waiting tasks size: 0 |
|
[state-dump] Number of executing tasks: 0 |
|
[state-dump] Number of pinned task arguments: 0 |
|
[state-dump] Number of total spilled tasks: 0 |
|
[state-dump] Number of spilled waiting tasks: 0 |
|
[state-dump] Number of spilled unschedulable tasks: 0 |
|
[state-dump] Resource usage { |
|
[state-dump] } |
|
[state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: |
|
[state-dump] |
|
[state-dump] Running tasks by scheduling class: |
|
[state-dump] ================================================== |
|
[state-dump] |
|
[state-dump] ClusterResources: |
|
[state-dump] LocalObjectManager: |
|
[state-dump] - num pinned objects: 0 |
|
[state-dump] - pinned objects size: 0 |
|
[state-dump] - num objects pending restore: 0 |
|
[state-dump] - num objects pending spill: 0 |
|
[state-dump] - num bytes pending spill: 0 |
|
[state-dump] - num bytes currently spilled: 0 |
|
[state-dump] - cumulative spill requests: 0 |
|
[state-dump] - cumulative restore requests: 0 |
|
[state-dump] - spilled objects pending delete: 0 |
|
[state-dump] |
|
[state-dump] ObjectManager: |
|
[state-dump] - num local objects: 0 |
|
[state-dump] - num unfulfilled push requests: 0 |
|
[state-dump] - num object pull requests: 0 |
|
[state-dump] - num chunks received total: 0 |
|
[state-dump] - num chunks received failed (all): 0 |
|
[state-dump] - num chunks received failed / cancelled: 0 |
|
[state-dump] - num chunks received failed / plasma error: 0 |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 0 total (0 active) |
|
[state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] Execution time: mean = -nan s, total = 0.000 s |
|
[state-dump] Event stats: |
|
[state-dump] PushManager: |
|
[state-dump] - num pushes in flight: 0 |
|
[state-dump] - num chunks in flight: 0 |
|
[state-dump] - num chunks remaining: 0 |
|
[state-dump] - max chunks allowed: 409 |
|
[state-dump] OwnershipBasedObjectDirectory: |
|
[state-dump] - num listeners: 0 |
|
[state-dump] - cumulative location updates: 0 |
|
[state-dump] - num location updates per second: 0.000 |
|
[state-dump] - num location lookups per second: 0.000 |
|
[state-dump] - num locations added per second: 0.000 |
|
[state-dump] - num locations removed per second: 0.000 |
|
[state-dump] BufferPool: |
|
[state-dump] - create buffer state map size: 0 |
|
[state-dump] PullManager: |
|
[state-dump] - num bytes available for pulled objects: 28896669696 |
|
[state-dump] - num bytes being pulled (all): 0 |
|
[state-dump] - num bytes being pulled / pinned: 0 |
|
[state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - first get request bundle: N/A |
|
[state-dump] - first wait request bundle: N/A |
|
[state-dump] - first task request bundle: N/A |
|
[state-dump] - num objects queued: 0 |
|
[state-dump] - num objects actively pulled (all): 0 |
|
[state-dump] - num objects actively pulled / pinned: 0 |
|
[state-dump] - num bundles being pulled: 0 |
|
[state-dump] - num pull retries: 0 |
|
[state-dump] - max timeout seconds: 0 |
|
[state-dump] - max timeout request is already processed. No entry. |
|
[state-dump] |
|
[state-dump] WorkerPool: |
|
[state-dump] - registered jobs: 1 |
|
[state-dump] - process_failed_job_config_missing: 0 |
|
[state-dump] - process_failed_rate_limited: 0 |
|
[state-dump] - process_failed_pending_registration: 0 |
|
[state-dump] - process_failed_runtime_env_setup_failed: 0 |
|
[state-dump] - num PYTHON workers: 16 |
|
[state-dump] - num PYTHON drivers: 1 |
|
[state-dump] - num PYTHON pending start requests: 0 |
|
[state-dump] - num PYTHON pending registration requests: 0 |
|
[state-dump] - num object spill callbacks queued: 0 |
|
[state-dump] - num object restore queued: 0 |
|
[state-dump] - num util functions queued: 0 |
|
[state-dump] - num idle workers: 16 |
|
[state-dump] TaskDependencyManager: |
|
[state-dump] - task deps map size: 0 |
|
[state-dump] - get req map size: 0 |
|
[state-dump] - wait req map size: 0 |
|
[state-dump] - local objects map size: 0 |
|
[state-dump] WaitManager: |
|
[state-dump] - num active wait requests: 0 |
|
[state-dump] Subscriber: |
|
[state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_OBJECT_EVICTION |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_REF_REMOVED_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] num async plasma notifications: 0 |
|
[state-dump] Remote node managers: |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 9780 total (31 active) |
|
[state-dump] Queueing time: mean = 792.628 us, max = 4.119 s, min = 53.000 ns, total = 7.752 s |
|
[state-dump] Execution time: mean = 367.449 us, total = 3.594 s |
|
[state-dump] Event stats: |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 2040 total (0 active), Execution time: mean = 557.796 us, total = 1.138 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 2040 total (0 active), Execution time: mean = 39.077 us, total = 79.717 ms, Queueing time: mean = 115.223 us, max = 457.744 us, min = 4.360 us, total = 235.054 ms |
|
[state-dump] NodeManager.CheckGC - 1199 total (1 active), Execution time: mean = 3.089 us, total = 3.704 ms, Queueing time: mean = 109.551 us, max = 7.776 ms, min = 23.923 us, total = 131.352 ms |
|
[state-dump] RaySyncer.OnDemandBroadcasting - 1199 total (1 active), Execution time: mean = 11.518 us, total = 13.811 ms, Queueing time: mean = 102.150 us, max = 7.768 ms, min = 22.646 us, total = 122.478 ms |
|
[state-dump] ObjectManager.UpdateAvailableMemory - 1199 total (0 active), Execution time: mean = 6.207 us, total = 7.442 ms, Queueing time: mean = 114.960 us, max = 448.428 us, min = 4.677 us, total = 137.837 ms |
|
[state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 600 total (1 active), Execution time: mean = 19.248 us, total = 11.549 ms, Queueing time: mean = 78.979 us, max = 1.318 ms, min = 17.876 us, total = 47.388 ms |
|
[state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 480 total (1 active), Execution time: mean = 464.469 us, total = 222.945 ms, Queueing time: mean = 76.100 us, max = 170.408 us, min = 13.597 us, total = 36.528 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 120 total (0 active), Execution time: mean = 111.416 us, total = 13.370 ms, Queueing time: mean = 112.850 us, max = 178.893 us, min = 28.435 us, total = 13.542 ms |
|
[state-dump] NodeManager.deadline_timer.flush_free_objects - 120 total (1 active), Execution time: mean = 8.596 us, total = 1.031 ms, Queueing time: mean = 182.385 us, max = 1.521 ms, min = 10.003 us, total = 21.886 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad - 120 total (0 active), Execution time: mean = 664.312 us, total = 79.717 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.ScheduleAndDispatchTasks - 120 total (1 active), Execution time: mean = 16.615 us, total = 1.994 ms, Queueing time: mean = 87.285 us, max = 2.335 ms, min = 10.081 us, total = 10.474 ms |
|
[state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 120 total (1 active), Execution time: mean = 2.676 us, total = 321.124 us, Queueing time: mean = 186.778 us, max = 1.518 ms, min = 8.097 us, total = 22.413 ms |
|
[state-dump] ClientConnection.async_read.ProcessMessageHeader - 73 total (17 active), Execution time: mean = 7.363 us, total = 537.491 us, Queueing time: mean = 93.740 ms, max = 4.119 s, min = 23.863 us, total = 6.843 s |
|
[state-dump] ClientConnection.async_read.ProcessMessage - 56 total (0 active), Execution time: mean = 853.118 us, total = 47.775 ms, Queueing time: mean = 50.305 us, max = 330.349 us, min = 4.637 us, total = 2.817 ms |
|
[state-dump] ClusterResourceManager.ResetRemoteNodeView - 41 total (1 active), Execution time: mean = 9.179 us, total = 376.325 us, Queueing time: mean = 74.669 us, max = 138.971 us, min = 18.048 us, total = 3.061 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 24 total (0 active), Execution time: mean = 1.493 ms, total = 35.830 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 24 total (0 active), Execution time: mean = 56.937 us, total = 1.366 ms, Queueing time: mean = 126.593 us, max = 168.572 us, min = 36.527 us, total = 3.038 ms |
|
[state-dump] NodeManager.deadline_timer.record_metrics - 24 total (1 active), Execution time: mean = 555.662 us, total = 13.336 ms, Queueing time: mean = 362.533 us, max = 1.102 ms, min = 20.120 us, total = 8.701 ms |
|
[state-dump] NodeManager.GcsCheckAlive - 24 total (1 active), Execution time: mean = 286.192 us, total = 6.869 ms, Queueing time: mean = 621.305 us, max = 1.435 ms, min = 90.244 us, total = 14.911 ms |
|
[state-dump] ClientConnection.async_write.DoAsyncWrites - 18 total (0 active), Execution time: mean = 1.183 us, total = 21.290 us, Queueing time: mean = 58.972 us, max = 350.163 us, min = 11.754 us, total = 1.061 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 17 total (0 active), Execution time: mean = 111.454 us, total = 1.895 ms, Queueing time: mean = 2.344 ms, max = 37.348 ms, min = 10.325 us, total = 39.845 ms |
|
[state-dump] ObjectManager.ObjectAdded - 17 total (0 active), Execution time: mean = 10.989 us, total = 186.814 us, Queueing time: mean = 90.374 us, max = 201.039 us, min = 8.654 us, total = 1.536 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig - 17 total (0 active), Execution time: mean = 3.105 ms, total = 52.788 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ObjectManager.ObjectDeleted - 17 total (0 active), Execution time: mean = 19.136 us, total = 325.310 us, Queueing time: mean = 150.300 us, max = 363.207 us, min = 38.732 us, total = 2.555 ms |
|
[state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 193.429 us, total = 2.515 ms, Queueing time: mean = 3.658 ms, max = 12.297 ms, min = 61.360 us, total = 47.557 ms |
|
[state-dump] NodeManager.deadline_timer.debug_state_dump - 12 total (1 active), Execution time: mean = 1.714 ms, total = 20.566 ms, Queueing time: mean = 68.989 us, max = 128.814 us, min = 52.669 us, total = 827.868 us |
|
[state-dump] - 3 total (0 active), Execution time: mean = 1.041 us, total = 3.122 us, Queueing time: mean = 173.118 us, max = 225.649 us, min = 112.440 us, total = 519.354 us |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 3 total (0 active), Execution time: mean = 105.602 us, total = 316.805 us, Queueing time: mean = 94.684 us, max = 118.743 us, min = 66.960 us, total = 284.051 us |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 3 total (0 active), Execution time: mean = 831.908 us, total = 2.496 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] WorkerPool.PopWorkerCallback - 3 total (0 active), Execution time: mean = 37.328 us, total = 111.985 us, Queueing time: mean = 70.069 us, max = 108.129 us, min = 36.205 us, total = 210.206 us |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker - 3 total (0 active), Execution time: mean = 618.798 us, total = 1.856 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] RaySyncer.BroadcastMessage - 3 total (0 active), Execution time: mean = 171.243 us, total = 513.728 us, Queueing time: mean = 495.000 ns, max = 795.000 ns, min = 64.000 ns, total = 1.485 us |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 3 total (0 active), Execution time: mean = 172.808 us, total = 518.424 us, Queueing time: mean = 157.745 us, max = 249.875 us, min = 104.417 us, total = 473.236 us |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 395.836 ms, total = 791.673 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.103 us, total = 4.205 us, Queueing time: mean = 207.500 ns, max = 362.000 ns, min = 53.000 ns, total = 415.000 ns |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.672 ms, total = 3.345 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 151.719 us, total = 303.438 us, Queueing time: mean = 634.301 us, max = 1.112 ms, min = 156.108 us, total = 1.269 ms |
|
[state-dump] NodeManager.deadline_timer.print_event_loop_stats - 2 total (1 active, 1 running), Execution time: mean = 1.483 ms, total = 2.965 ms, Queueing time: mean = 33.331 us, max = 66.661 us, min = 66.661 us, total = 66.661 us |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.021 s, total = 1.021 s, Queueing time: mean = 115.896 us, max = 115.896 us, min = 115.896 us, total = 115.896 us |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 250.770 us, total = 250.770 us, Queueing time: mean = 167.708 us, max = 167.708 us, min = 167.708 us, total = 167.708 us |
|
[state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.936 ms, total = 1.936 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 110.445 us, total = 110.445 us, Queueing time: mean = 119.814 us, max = 119.814 us, min = 119.814 us, total = 119.814 us |
|
[state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 65.944 us, total = 65.944 us, Queueing time: mean = 245.340 us, max = 245.340 us, min = 245.340 us, total = 245.340 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 25.894 us, total = 25.894 us, Queueing time: mean = 128.891 us, max = 128.891 us, min = 128.891 us, total = 128.891 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.896 ms, total = 1.896 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 213.138 us, total = 213.138 us, Queueing time: mean = 91.253 us, max = 91.253 us, min = 91.253 us, total = 91.253 us |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.687 ms, total = 1.687 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.171 ms, total = 1.171 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 57.772 us, total = 57.772 us, Queueing time: mean = 171.436 us, max = 171.436 us, min = 171.436 us, total = 171.436 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.637 ms, total = 1.637 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.350 ms, total = 1.350 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 633.811 us, total = 633.811 us, Queueing time: mean = 119.913 us, max = 119.913 us, min = 119.913 us, total = 119.913 us |
|
[state-dump] DebugString() time ms: 1 |
|
[state-dump] |
|
[state-dump] |
|
[2025-01-20 22:52:27,485 I 10411 10439] (raylet) store.cc:564: Plasma store debug dump: |
|
Current usage: 0 / 28.8967 GB |
|
- num bytes created total: 136 |
|
0 pending objects of total size 0MB |
|
- objects spillable: 0 |
|
- bytes spillable: 0 |
|
- objects unsealed: 0 |
|
- bytes unsealed: 0 |
|
- objects in use: 0 |
|
- bytes in use: 0 |
|
- objects evictable: 0 |
|
- bytes evictable: 0 |
|
|
|
- objects created by worker: 0 |
|
- bytes created by worker: 0 |
|
- objects restored: 0 |
|
- bytes restored: 0 |
|
- objects received: 0 |
|
- bytes received: 0 |
|
- objects errored: 0 |
|
- bytes errored: 0 |
|
|
|
[2025-01-20 22:52:28,516 I 10411 10411] (raylet) node_manager.cc:525: [state-dump] NodeManager: |
|
[state-dump] Node ID: 436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3 |
|
[state-dump] Node name: 192.168.0.2 |
|
[state-dump] InitialConfigResources: {node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 577933393920000, object_store_memory: 288966696960000, CPU: 160000, GPU: 20000} |
|
[state-dump] ClusterTaskManager: |
|
[state-dump] ========== Node: 436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3 ================= |
|
[state-dump] Infeasible queue length: 0 |
|
[state-dump] Schedule queue length: 0 |
|
[state-dump] Dispatch queue length: 0 |
|
[state-dump] num_waiting_for_resource: 0 |
|
[state-dump] num_waiting_for_plasma_memory: 0 |
|
[state-dump] num_waiting_for_remote_node_resources: 0 |
|
[state-dump] num_worker_not_started_by_job_config_not_exist: 0 |
|
[state-dump] num_worker_not_started_by_registration_timeout: 0 |
|
[state-dump] num_tasks_waiting_for_workers: 0 |
|
[state-dump] num_cancelled_tasks: 0 |
|
[state-dump] cluster_resource_scheduler state: |
|
[state-dump] Local id: 5156797141345256205 Local resources: {"total":{GPU: [10000, 10000], CPU: [160000], object_store_memory: [288966696960000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], node:__internal_head__: [10000], memory: [577933393920000]}}, "available": {GPU: [10000, 10000], CPU: [160000], object_store_memory: [288966696960000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], node:__internal_head__: [10000], memory: [577933393920000]}}, "labels":{"ray.io/node_id":"436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3",} is_draining: 0 is_idle: 1 Cluster resources: node id: 5156797141345256205{"total":{node:192.168.0.2: 10000, accelerator_type:A40: 10000, memory: 577933393920000, node:__internal_head__: 10000, GPU: 20000, CPU: 160000, object_store_memory: 288966696960000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, memory: 577933393920000, GPU: 20000, node:__internal_head__: 10000, CPU: 160000, object_store_memory: 288966696960000}}, "labels":{"ray.io/node_id":"436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} |
|
[state-dump] Waiting tasks size: 0 |
|
[state-dump] Number of executing tasks: 0 |
|
[state-dump] Number of pinned task arguments: 0 |
|
[state-dump] Number of total spilled tasks: 0 |
|
[state-dump] Number of spilled waiting tasks: 0 |
|
[state-dump] Number of spilled unschedulable tasks: 0 |
|
[state-dump] Resource usage { |
|
[state-dump] } |
|
[state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: |
|
[state-dump] |
|
[state-dump] Running tasks by scheduling class: |
|
[state-dump] ================================================== |
|
[state-dump] |
|
[state-dump] ClusterResources: |
|
[state-dump] LocalObjectManager: |
|
[state-dump] - num pinned objects: 0 |
|
[state-dump] - pinned objects size: 0 |
|
[state-dump] - num objects pending restore: 0 |
|
[state-dump] - num objects pending spill: 0 |
|
[state-dump] - num bytes pending spill: 0 |
|
[state-dump] - num bytes currently spilled: 0 |
|
[state-dump] - cumulative spill requests: 0 |
|
[state-dump] - cumulative restore requests: 0 |
|
[state-dump] - spilled objects pending delete: 0 |
|
[state-dump] |
|
[state-dump] ObjectManager: |
|
[state-dump] - num local objects: 0 |
|
[state-dump] - num unfulfilled push requests: 0 |
|
[state-dump] - num object pull requests: 0 |
|
[state-dump] - num chunks received total: 0 |
|
[state-dump] - num chunks received failed (all): 0 |
|
[state-dump] - num chunks received failed / cancelled: 0 |
|
[state-dump] - num chunks received failed / plasma error: 0 |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 0 total (0 active) |
|
[state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] Execution time: mean = -nan s, total = 0.000 s |
|
[state-dump] Event stats: |
|
[state-dump] PushManager: |
|
[state-dump] - num pushes in flight: 0 |
|
[state-dump] - num chunks in flight: 0 |
|
[state-dump] - num chunks remaining: 0 |
|
[state-dump] - max chunks allowed: 409 |
|
[state-dump] OwnershipBasedObjectDirectory: |
|
[state-dump] - num listeners: 0 |
|
[state-dump] - cumulative location updates: 0 |
|
[state-dump] - num location updates per second: 0.000 |
|
[state-dump] - num location lookups per second: 0.000 |
|
[state-dump] - num locations added per second: 0.000 |
|
[state-dump] - num locations removed per second: 0.000 |
|
[state-dump] BufferPool: |
|
[state-dump] - create buffer state map size: 0 |
|
[state-dump] PullManager: |
|
[state-dump] - num bytes available for pulled objects: 28896669696 |
|
[state-dump] - num bytes being pulled (all): 0 |
|
[state-dump] - num bytes being pulled / pinned: 0 |
|
[state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - first get request bundle: N/A |
|
[state-dump] - first wait request bundle: N/A |
|
[state-dump] - first task request bundle: N/A |
|
[state-dump] - num objects queued: 0 |
|
[state-dump] - num objects actively pulled (all): 0 |
|
[state-dump] - num objects actively pulled / pinned: 0 |
|
[state-dump] - num bundles being pulled: 0 |
|
[state-dump] - num pull retries: 0 |
|
[state-dump] - max timeout seconds: 0 |
|
[state-dump] - max timeout request is already processed. No entry. |
|
[state-dump] |
|
[state-dump] WorkerPool: |
|
[state-dump] - registered jobs: 1 |
|
[state-dump] - process_failed_job_config_missing: 0 |
|
[state-dump] - process_failed_rate_limited: 0 |
|
[state-dump] - process_failed_pending_registration: 0 |
|
[state-dump] - process_failed_runtime_env_setup_failed: 0 |
|
[state-dump] - num PYTHON workers: 16 |
|
[state-dump] - num PYTHON drivers: 1 |
|
[state-dump] - num PYTHON pending start requests: 0 |
|
[state-dump] - num PYTHON pending registration requests: 0 |
|
[state-dump] - num object spill callbacks queued: 0 |
|
[state-dump] - num object restore queued: 0 |
|
[state-dump] - num util functions queued: 0 |
|
[state-dump] - num idle workers: 16 |
|
[state-dump] TaskDependencyManager: |
|
[state-dump] - task deps map size: 0 |
|
[state-dump] - get req map size: 0 |
|
[state-dump] - wait req map size: 0 |
|
[state-dump] - local objects map size: 0 |
|
[state-dump] WaitManager: |
|
[state-dump] - num active wait requests: 0 |
|
[state-dump] Subscriber: |
|
[state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_OBJECT_EVICTION |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_REF_REMOVED_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] num async plasma notifications: 0 |
|
[state-dump] Remote node managers: |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 14534 total (31 active) |
|
[state-dump] Queueing time: mean = 560.947 us, max = 4.119 s, min = 53.000 ns, total = 8.153 s |
|
[state-dump] Execution time: mean = 304.389 us, total = 4.424 s |
|
[state-dump] Event stats: |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 3060 total (0 active), Execution time: mean = 555.570 us, total = 1.700 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 3060 total (0 active), Execution time: mean = 39.133 us, total = 119.747 ms, Queueing time: mean = 113.844 us, max = 457.744 us, min = 4.360 us, total = 348.364 ms |
|
[state-dump] NodeManager.CheckGC - 1799 total (1 active), Execution time: mean = 3.092 us, total = 5.562 ms, Queueing time: mean = 106.415 us, max = 7.776 ms, min = 12.553 us, total = 191.441 ms |
|
[state-dump] RaySyncer.OnDemandBroadcasting - 1799 total (1 active), Execution time: mean = 11.568 us, total = 20.811 ms, Queueing time: mean = 98.964 us, max = 7.768 ms, min = 17.205 us, total = 178.036 ms |
|
[state-dump] ObjectManager.UpdateAvailableMemory - 1799 total (0 active), Execution time: mean = 6.242 us, total = 11.230 ms, Queueing time: mean = 113.511 us, max = 448.428 us, min = 4.677 us, total = 204.206 ms |
|
[state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 900 total (1 active), Execution time: mean = 19.765 us, total = 17.789 ms, Queueing time: mean = 77.547 us, max = 1.318 ms, min = 16.015 us, total = 69.793 ms |
|
[state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 719 total (1 active), Execution time: mean = 465.029 us, total = 334.356 ms, Queueing time: mean = 85.031 us, max = 5.629 ms, min = 9.673 us, total = 61.137 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 180 total (0 active), Execution time: mean = 112.686 us, total = 20.284 ms, Queueing time: mean = 114.540 us, max = 179.044 us, min = 28.435 us, total = 20.617 ms |
|
[state-dump] NodeManager.deadline_timer.flush_free_objects - 180 total (1 active), Execution time: mean = 8.978 us, total = 1.616 ms, Queueing time: mean = 196.102 us, max = 2.590 ms, min = 10.003 us, total = 35.298 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad - 180 total (0 active), Execution time: mean = 673.587 us, total = 121.246 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.ScheduleAndDispatchTasks - 180 total (1 active), Execution time: mean = 17.147 us, total = 3.086 ms, Queueing time: mean = 81.875 us, max = 2.335 ms, min = 10.081 us, total = 14.738 ms |
|
[state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 180 total (1 active), Execution time: mean = 2.719 us, total = 489.401 us, Queueing time: mean = 200.703 us, max = 2.588 ms, min = 8.097 us, total = 36.127 ms |
|
[state-dump] ClientConnection.async_read.ProcessMessageHeader - 73 total (17 active), Execution time: mean = 7.363 us, total = 537.491 us, Queueing time: mean = 93.740 ms, max = 4.119 s, min = 23.863 us, total = 6.843 s |
|
[state-dump] ClusterResourceManager.ResetRemoteNodeView - 61 total (1 active), Execution time: mean = 9.290 us, total = 566.674 us, Queueing time: mean = 74.143 us, max = 138.971 us, min = 18.048 us, total = 4.523 ms |
|
[state-dump] ClientConnection.async_read.ProcessMessage - 56 total (0 active), Execution time: mean = 853.118 us, total = 47.775 ms, Queueing time: mean = 50.305 us, max = 330.349 us, min = 4.637 us, total = 2.817 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 36 total (0 active), Execution time: mean = 1.526 ms, total = 54.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 36 total (0 active), Execution time: mean = 55.622 us, total = 2.002 ms, Queueing time: mean = 118.232 us, max = 168.572 us, min = 35.268 us, total = 4.256 ms |
|
[state-dump] NodeManager.deadline_timer.record_metrics - 36 total (1 active), Execution time: mean = 584.955 us, total = 21.058 ms, Queueing time: mean = 412.092 us, max = 2.227 ms, min = 20.120 us, total = 14.835 ms |
|
[state-dump] NodeManager.GcsCheckAlive - 36 total (1 active), Execution time: mean = 288.581 us, total = 10.389 ms, Queueing time: mean = 700.786 us, max = 2.620 ms, min = 90.244 us, total = 25.228 ms |
|
[state-dump] ClientConnection.async_write.DoAsyncWrites - 18 total (0 active), Execution time: mean = 1.183 us, total = 21.290 us, Queueing time: mean = 58.972 us, max = 350.163 us, min = 11.754 us, total = 1.061 ms |
|
[state-dump] NodeManager.deadline_timer.debug_state_dump - 18 total (1 active), Execution time: mean = 1.878 ms, total = 33.805 ms, Queueing time: mean = 66.747 us, max = 128.814 us, min = 40.760 us, total = 1.201 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 17 total (0 active), Execution time: mean = 111.454 us, total = 1.895 ms, Queueing time: mean = 2.344 ms, max = 37.348 ms, min = 10.325 us, total = 39.845 ms |
|
[state-dump] ObjectManager.ObjectAdded - 17 total (0 active), Execution time: mean = 10.989 us, total = 186.814 us, Queueing time: mean = 90.374 us, max = 201.039 us, min = 8.654 us, total = 1.536 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig - 17 total (0 active), Execution time: mean = 3.105 ms, total = 52.788 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ObjectManager.ObjectDeleted - 17 total (0 active), Execution time: mean = 19.136 us, total = 325.310 us, Queueing time: mean = 150.300 us, max = 363.207 us, min = 38.732 us, total = 2.555 ms |
|
[state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 193.429 us, total = 2.515 ms, Queueing time: mean = 3.658 ms, max = 12.297 ms, min = 61.360 us, total = 47.557 ms |
|
[state-dump] - 3 total (0 active), Execution time: mean = 1.041 us, total = 3.122 us, Queueing time: mean = 173.118 us, max = 225.649 us, min = 112.440 us, total = 519.354 us |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 3 total (0 active), Execution time: mean = 105.602 us, total = 316.805 us, Queueing time: mean = 94.684 us, max = 118.743 us, min = 66.960 us, total = 284.051 us |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 3 total (0 active), Execution time: mean = 831.908 us, total = 2.496 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] WorkerPool.PopWorkerCallback - 3 total (0 active), Execution time: mean = 37.328 us, total = 111.985 us, Queueing time: mean = 70.069 us, max = 108.129 us, min = 36.205 us, total = 210.206 us |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker - 3 total (0 active), Execution time: mean = 618.798 us, total = 1.856 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] RaySyncer.BroadcastMessage - 3 total (0 active), Execution time: mean = 171.243 us, total = 513.728 us, Queueing time: mean = 495.000 ns, max = 795.000 ns, min = 64.000 ns, total = 1.485 us |
|
[state-dump] NodeManager.deadline_timer.print_event_loop_stats - 3 total (1 active, 1 running), Execution time: mean = 2.043 ms, total = 6.130 ms, Queueing time: mean = 222.033 us, max = 599.439 us, min = 66.661 us, total = 666.100 us |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 3 total (0 active), Execution time: mean = 172.808 us, total = 518.424 us, Queueing time: mean = 157.745 us, max = 249.875 us, min = 104.417 us, total = 473.236 us |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 395.836 ms, total = 791.673 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.103 us, total = 4.205 us, Queueing time: mean = 207.500 ns, max = 362.000 ns, min = 53.000 ns, total = 415.000 ns |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.672 ms, total = 3.345 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 151.719 us, total = 303.438 us, Queueing time: mean = 634.301 us, max = 1.112 ms, min = 156.108 us, total = 1.269 ms |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.021 s, total = 1.021 s, Queueing time: mean = 115.896 us, max = 115.896 us, min = 115.896 us, total = 115.896 us |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 250.770 us, total = 250.770 us, Queueing time: mean = 167.708 us, max = 167.708 us, min = 167.708 us, total = 167.708 us |
|
[state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.936 ms, total = 1.936 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 110.445 us, total = 110.445 us, Queueing time: mean = 119.814 us, max = 119.814 us, min = 119.814 us, total = 119.814 us |
|
[state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 65.944 us, total = 65.944 us, Queueing time: mean = 245.340 us, max = 245.340 us, min = 245.340 us, total = 245.340 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 25.894 us, total = 25.894 us, Queueing time: mean = 128.891 us, max = 128.891 us, min = 128.891 us, total = 128.891 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.896 ms, total = 1.896 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 213.138 us, total = 213.138 us, Queueing time: mean = 91.253 us, max = 91.253 us, min = 91.253 us, total = 91.253 us |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.687 ms, total = 1.687 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.171 ms, total = 1.171 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 57.772 us, total = 57.772 us, Queueing time: mean = 171.436 us, max = 171.436 us, min = 171.436 us, total = 171.436 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.637 ms, total = 1.637 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.350 ms, total = 1.350 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 633.811 us, total = 633.811 us, Queueing time: mean = 119.913 us, max = 119.913 us, min = 119.913 us, total = 119.913 us |
|
[state-dump] DebugString() time ms: 1 |
|
[state-dump] |
|
[state-dump] |
|
[2025-01-20 22:52:57,926 I 10411 10411] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=3, has creation task exception = false |
|
[2025-01-20 22:52:57,927 I 10411 10411] (raylet) node_manager.cc:1586: Driver (pid=8700) is disconnected. worker_id=01000000ffffffffffffffffffffffffffffffffffffffffffffffff job_id=01000000 |
|
[2025-01-20 22:52:57,933 I 10411 10411] (raylet) worker_pool.cc:692: Job 01000000 already started in worker pool. |
|
[2025-01-20 22:52:58,002 I 10411 10411] (raylet) worker_pool.cc:1119: Force exiting worker whose job has exited d82d64e593659f61880eb00ec26bb55a7feb8160053c8103e6e82b56 |
|
[2025-01-20 22:52:58,003 I 10411 10411] (raylet) worker_pool.cc:1119: Force exiting worker whose job has exited cc2a657ca6b8a0842e5e28f0ed26722f2c81f23b5671e82af9670ecd |
|
[2025-01-20 22:52:58,004 I 10411 10411] (raylet) worker_pool.cc:1119: Force exiting worker whose job has exited e599f7546d06384ea290c4891a80c8f5dd640ca4bfad872f86069284 |
|
[2025-01-20 22:52:58,009 I 10411 10411] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false |
|
[2025-01-20 22:52:58,012 I 10411 10411] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false |
|
[2025-01-20 22:52:58,012 I 10411 10411] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false |
|
[2025-01-20 22:52:58,016 I 10411 10411] (raylet) main.cc:454: received SIGTERM. Existing local drain request = None |
|
[2025-01-20 22:52:58,016 I 10411 10411] (raylet) main.cc:255: Raylet graceful shutdown triggered, reason = EXPECTED_TERMINATION, reason message = received SIGTERM |
|
[2025-01-20 22:52:58,016 I 10411 10411] (raylet) main.cc:258: Shutting down... |
|
[2025-01-20 22:52:58,016 I 10411 10411] (raylet) accessor.cc:510: Unregistering node node_id=436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3 |
|
[2025-01-20 22:52:58,019 I 10411 10411] (raylet) accessor.cc:523: Finished unregistering node info, status = OK node_id=436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3 |
|
[2025-01-20 22:52:58,022 I 10411 10411] (raylet) agent_manager.cc:112: Killing agent dashboard_agent/424238335, pid 10500. |
|
[2025-01-20 22:52:58,035 I 10411 10501] (raylet) agent_manager.cc:79: Agent process with name dashboard_agent/424238335 exited, exit code 0. |
|
[2025-01-20 22:52:58,035 I 10411 10411] (raylet) agent_manager.cc:112: Killing agent runtime_env_agent, pid 10502. |
|
[2025-01-20 22:52:58,043 I 10411 10503] (raylet) agent_manager.cc:79: Agent process with name runtime_env_agent exited, exit code 0. |
|
[2025-01-20 22:52:58,044 I 10411 10411] (raylet) io_service_pool.cc:47: IOServicePool is stopped. |
|
[2025-01-20 22:52:58,143 I 10411 10411] (raylet) stats.h:120: Stats module has shutdown. |
|
|