[2025-01-21 05:12:16,186 I 16746 16746] (raylet) main.cc:180: Setting cluster ID to: 621dd511d43f0600c7fa977093088315fa64471d60b691074ed19ce6 [2025-01-21 05:12:16,193 I 16746 16746] (raylet) main.cc:289: Raylet is not set to kill unknown children. [2025-01-21 05:12:16,193 I 16746 16746] (raylet) io_service_pool.cc:35: IOServicePool is running with 1 io_service. [2025-01-21 05:12:16,194 I 16746 16746] (raylet) main.cc:419: Setting node ID node_id=3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [2025-01-21 05:12:16,194 I 16746 16746] (raylet) store_runner.cc:32: Allowing the Plasma store to use up to 2.14748GB of memory. [2025-01-21 05:12:16,194 I 16746 16746] (raylet) store_runner.cc:48: Starting object store with directory /dev/shm, fallback /tmp/ray, and huge page support disabled [2025-01-21 05:12:16,195 I 16746 16775] (raylet) dlmalloc.cc:154: create_and_mmap_buffer(2147483656, /dev/shm/plasmaXXXXXX) [2025-01-21 05:12:16,196 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 0 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:12:16,198 I 16746 16746] (raylet) grpc_server.cc:134: ObjectManager server started, listening on port 40125. [2025-01-21 05:12:16,200 I 16746 16746] (raylet) worker_killing_policy.cc:101: Running GroupByOwner policy. [2025-01-21 05:12:16,201 I 16746 16746] (raylet) memory_monitor.cc:47: MemoryMonitor initialized with usage threshold at 94999994368 bytes (0.95 system memory), total system memory bytes: 99999997952 [2025-01-21 05:12:16,201 I 16746 16746] (raylet) node_manager.cc:287: Initializing NodeManager node_id=3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [2025-01-21 05:12:16,201 I 16746 16746] (raylet) grpc_server.cc:134: NodeManager server started, listening on port 38223. [2025-01-21 05:12:16,207 I 16746 16814] (raylet) agent_manager.cc:77: Monitor agent process with name dashboard_agent/424238335 [2025-01-21 05:12:16,207 I 16746 16816] (raylet) agent_manager.cc:77: Monitor agent process with name runtime_env_agent [2025-01-21 05:12:16,207 I 16746 16746] (raylet) event.cc:493: Ray Event initialized for RAYLET [2025-01-21 05:12:16,207 I 16746 16746] (raylet) event.cc:324: Set ray event level to warning [2025-01-21 05:12:16,209 I 16746 16746] (raylet) raylet.cc:134: Raylet of id, 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 started. Raylet consists of node_manager and object_manager. node_manager address: 192.168.0.2:38223 object_manager address: 192.168.0.2:40125 hostname: 0cd925b1f73b [2025-01-21 05:12:16,212 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{object_store_memory: 21474836480000, CPU: 200000, node:__internal_head__: 10000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "available": {object_store_memory: 21474836480000, CPU: 200000, node:__internal_head__: 10000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 70204023774940000.000 [state-dump] - num location lookups per second: 70204023774928000.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 0 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 0 [state-dump] - num PYTHON drivers: 0 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 0 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 27 total (13 active) [state-dump] Queueing time: mean = 1.046 ms, max = 6.946 ms, min = 12.617 us, total = 28.248 ms [state-dump] Execution time: mean = 810.775 us, total = 21.891 ms [state-dump] Event stats: [state-dump] PeriodicalRunner.RunFnPeriodically - 11 total (2 active, 1 running), Execution time: mean = 200.517 us, total = 2.206 ms, Queueing time: mean = 2.565 ms, max = 6.946 ms, min = 27.474 us, total = 28.214 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 1 total (0 active), Execution time: mean = 1.345 ms, total = 1.345 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 0 [state-dump] [state-dump] [2025-01-21 05:12:16,213 I 16746 16746] (raylet) accessor.cc:762: Received notification for node, IsAlive = 1 node_id=3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [2025-01-21 05:12:16,352 I 16746 16746] (raylet) worker_pool.cc:501: Started worker process with pid 16853, the token is 0 [2025-01-21 05:12:16,355 I 16746 16746] (raylet) worker_pool.cc:501: Started worker process with pid 16854, the token is 1 [2025-01-21 05:12:16,358 I 16746 16746] (raylet) worker_pool.cc:501: Started worker process with pid 16855, the token is 2 [2025-01-21 05:12:16,360 I 16746 16746] (raylet) worker_pool.cc:501: Started worker process with pid 16856, the token is 3 [2025-01-21 05:12:16,362 I 16746 16746] (raylet) worker_pool.cc:501: Started worker process with pid 16857, the token is 4 [2025-01-21 05:12:16,364 I 16746 16746] (raylet) worker_pool.cc:501: Started worker process with pid 16858, the token is 5 [2025-01-21 05:12:16,366 I 16746 16746] (raylet) worker_pool.cc:501: Started worker process with pid 16859, the token is 6 [2025-01-21 05:12:16,367 I 16746 16746] (raylet) worker_pool.cc:501: Started worker process with pid 16860, the token is 7 [2025-01-21 05:12:16,369 I 16746 16746] (raylet) worker_pool.cc:501: Started worker process with pid 16861, the token is 8 [2025-01-21 05:12:16,371 I 16746 16746] (raylet) worker_pool.cc:501: Started worker process with pid 16862, the token is 9 [2025-01-21 05:12:16,373 I 16746 16746] (raylet) worker_pool.cc:501: Started worker process with pid 16863, the token is 10 [2025-01-21 05:12:16,375 I 16746 16746] (raylet) worker_pool.cc:501: Started worker process with pid 16864, the token is 11 [2025-01-21 05:12:16,377 I 16746 16746] (raylet) worker_pool.cc:501: Started worker process with pid 16865, the token is 12 [2025-01-21 05:12:16,378 I 16746 16746] (raylet) worker_pool.cc:501: Started worker process with pid 16866, the token is 13 [2025-01-21 05:12:16,380 I 16746 16746] (raylet) worker_pool.cc:501: Started worker process with pid 16867, the token is 14 [2025-01-21 05:12:16,382 I 16746 16746] (raylet) worker_pool.cc:501: Started worker process with pid 16868, the token is 15 [2025-01-21 05:12:16,384 I 16746 16746] (raylet) worker_pool.cc:501: Started worker process with pid 16869, the token is 16 [2025-01-21 05:12:16,386 I 16746 16746] (raylet) worker_pool.cc:501: Started worker process with pid 16870, the token is 17 [2025-01-21 05:12:16,388 I 16746 16746] (raylet) worker_pool.cc:501: Started worker process with pid 16871, the token is 18 [2025-01-21 05:12:16,390 I 16746 16746] (raylet) worker_pool.cc:501: Started worker process with pid 16872, the token is 19 [2025-01-21 05:12:17,011 I 16746 16775] (raylet) object_store.cc:35: Object store current usage 8e-09 / 2.14748 GB. [2025-01-21 05:12:17,144 I 16746 16746] (raylet) worker_pool.cc:692: Job 01000000 already started in worker pool. [2025-01-21 05:12:26,204 W 16746 16769] (raylet) metric_exporter.cc:105: [1] Export metrics to agent failed: RpcError: RPC Error message: failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:61114: Failed to connect to remote host: Connection refused; RPC Error details: . This won't affect Ray, but you can lose metrics from the cluster. [2025-01-21 05:13:16,196 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:13:16,214 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{accelerator_type:A40: 10000, GPU: 20000, node:__internal_head__: 10000, memory: 869061529600000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, CPU: 200000}}, "available": {object_store_memory: 21474836480000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000, CPU: 200000, GPU: 20000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 5547 total (35 active) [state-dump] Queueing time: mean = 465.343 us, max = 825.882 ms, min = 171.000 ns, total = 2.581 s [state-dump] Execution time: mean = 306.171 us, total = 1.698 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 1260 total (0 active), Execution time: mean = 350.563 us, total = 441.709 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 1260 total (0 active), Execution time: mean = 23.423 us, total = 29.513 ms, Queueing time: mean = 76.366 us, max = 23.460 ms, min = 4.270 us, total = 96.221 ms [state-dump] RaySyncer.OnDemandBroadcasting - 600 total (1 active), Execution time: mean = 9.949 us, total = 5.969 ms, Queueing time: mean = 46.082 us, max = 393.108 us, min = 10.627 us, total = 27.649 ms [state-dump] NodeManager.CheckGC - 600 total (1 active), Execution time: mean = 3.090 us, total = 1.854 ms, Queueing time: mean = 51.364 us, max = 395.063 us, min = 8.858 us, total = 30.819 ms [state-dump] ObjectManager.UpdateAvailableMemory - 599 total (0 active), Execution time: mean = 3.906 us, total = 2.340 ms, Queueing time: mean = 54.196 us, max = 374.336 us, min = 11.834 us, total = 32.463 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 300 total (1 active), Execution time: mean = 12.917 us, total = 3.875 ms, Queueing time: mean = 46.506 us, max = 1.134 ms, min = 8.817 us, total = 13.952 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 240 total (1 active), Execution time: mean = 410.949 us, total = 98.628 ms, Queueing time: mean = 47.389 us, max = 125.858 us, min = 10.690 us, total = 11.373 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 87 total (21 active), Execution time: mean = 5.792 us, total = 503.945 us, Queueing time: mean = 26.034 ms, max = 825.882 ms, min = 21.510 us, total = 2.265 s [state-dump] ClientConnection.async_read.ProcessMessage - 66 total (0 active), Execution time: mean = 1.106 ms, total = 72.984 ms, Queueing time: mean = 46.098 us, max = 626.320 us, min = 4.173 us, total = 3.042 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 61 total (1 active), Execution time: mean = 9.530 us, total = 581.325 us, Queueing time: mean = 37.780 us, max = 77.711 us, min = 12.092 us, total = 2.305 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 60 total (0 active), Execution time: mean = 444.960 us, total = 26.698 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 60 total (1 active), Execution time: mean = 2.648 us, total = 158.876 us, Queueing time: mean = 166.691 us, max = 1.601 ms, min = 9.965 us, total = 10.001 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 60 total (0 active), Execution time: mean = 88.726 us, total = 5.324 ms, Queueing time: mean = 62.963 us, max = 144.807 us, min = 18.080 us, total = 3.778 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 60 total (1 active), Execution time: mean = 5.333 us, total = 319.974 us, Queueing time: mean = 164.668 us, max = 1.601 ms, min = 9.369 us, total = 9.880 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ClusterResourceManager.ResetRemoteNodeView - 21 total (1 active), Execution time: mean = 6.128 us, total = 128.679 us, Queueing time: mean = 44.754 us, max = 79.929 us, min = 18.801 us, total = 939.843 us [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.deadline_timer.record_metrics - 12 total (1 active), Execution time: mean = 478.628 us, total = 5.744 ms, Queueing time: mean = 317.694 us, max = 1.157 ms, min = 15.848 us, total = 3.812 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 12 total (0 active), Execution time: mean = 1.068 ms, total = 12.819 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 12 total (1 active), Execution time: mean = 211.123 us, total = 2.533 ms, Queueing time: mean = 557.811 us, max = 1.495 ms, min = 104.832 us, total = 6.694 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 12 total (0 active), Execution time: mean = 36.833 us, total = 441.995 us, Queueing time: mean = 68.021 us, max = 151.848 us, min = 20.254 us, total = 816.253 us [state-dump] NodeManager.deadline_timer.debug_state_dump - 6 total (1 active), Execution time: mean = 1.448 ms, total = 8.688 ms, Queueing time: mean = 27.821 us, max = 58.652 us, min = 11.663 us, total = 166.926 us [state-dump] RaySyncer.BroadcastMessage - 4 total (0 active), Execution time: mean = 160.869 us, total = 643.474 us, Queueing time: mean = 517.250 ns, max = 734.000 ns, min = 208.000 ns, total = 2.069 us [state-dump] - 4 total (0 active), Execution time: mean = 852.500 ns, total = 3.410 us, Queueing time: mean = 29.399 us, max = 38.157 us, min = 19.153 us, total = 117.598 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 467.288 ms, total = 934.575 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 2 total (0 active), Execution time: mean = 676.317 us, total = 1.353 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 2 total (0 active), Execution time: mean = 99.670 us, total = 199.340 us, Queueing time: mean = 29.024 us, max = 29.917 us, min = 28.131 us, total = 58.048 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 2 total (0 active), Execution time: mean = 372.122 us, total = 744.244 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 2 total (0 active), Execution time: mean = 30.619 us, total = 61.237 us, Queueing time: mean = 82.947 us, max = 104.884 us, min = 61.010 us, total = 165.894 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 2 total (0 active), Execution time: mean = 138.240 us, total = 276.481 us, Queueing time: mean = 99.016 us, max = 179.535 us, min = 18.498 us, total = 198.033 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 1 total (1 active, 1 running), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 184.095 us, total = 184.095 us, Queueing time: mean = 20.320 us, max = 20.320 us, min = 20.320 us, total = 20.320 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 05:14:16,196 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:14:16,217 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{accelerator_type:A40: 10000, GPU: 20000, node:__internal_head__: 10000, memory: 869061529600000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, CPU: 200000}}, "available": {object_store_memory: 21474836480000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000, CPU: 200000, GPU: 20000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 10780 total (35 active) [state-dump] Queueing time: mean = 264.317 us, max = 825.882 ms, min = 171.000 ns, total = 2.849 s [state-dump] Execution time: mean = 225.278 us, total = 2.428 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 2520 total (0 active), Execution time: mean = 377.491 us, total = 951.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 2520 total (0 active), Execution time: mean = 25.821 us, total = 65.069 ms, Queueing time: mean = 75.713 us, max = 23.460 ms, min = 4.270 us, total = 190.796 ms [state-dump] RaySyncer.OnDemandBroadcasting - 1200 total (1 active), Execution time: mean = 9.557 us, total = 11.468 ms, Queueing time: mean = 48.188 us, max = 393.108 us, min = 9.690 us, total = 57.825 ms [state-dump] NodeManager.CheckGC - 1200 total (1 active), Execution time: mean = 3.034 us, total = 3.641 ms, Queueing time: mean = 53.258 us, max = 395.063 us, min = 8.858 us, total = 63.909 ms [state-dump] ObjectManager.UpdateAvailableMemory - 1199 total (0 active), Execution time: mean = 4.194 us, total = 5.028 ms, Queueing time: mean = 65.253 us, max = 398.806 us, min = 7.083 us, total = 78.238 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 600 total (1 active), Execution time: mean = 13.992 us, total = 8.395 ms, Queueing time: mean = 50.166 us, max = 1.134 ms, min = 8.817 us, total = 30.099 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 480 total (1 active), Execution time: mean = 415.143 us, total = 199.269 ms, Queueing time: mean = 50.528 us, max = 169.160 us, min = 10.690 us, total = 24.254 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 121 total (1 active), Execution time: mean = 10.434 us, total = 1.263 ms, Queueing time: mean = 41.982 us, max = 146.330 us, min = 12.092 us, total = 5.080 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 120 total (1 active), Execution time: mean = 6.050 us, total = 725.963 us, Queueing time: mean = 151.972 us, max = 1.601 ms, min = 9.369 us, total = 18.237 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 120 total (1 active), Execution time: mean = 2.803 us, total = 336.388 us, Queueing time: mean = 154.246 us, max = 1.601 ms, min = 9.965 us, total = 18.510 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 119 total (0 active), Execution time: mean = 471.457 us, total = 56.103 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 119 total (0 active), Execution time: mean = 89.216 us, total = 10.617 ms, Queueing time: mean = 69.445 us, max = 161.625 us, min = 15.809 us, total = 8.264 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 87 total (21 active), Execution time: mean = 5.792 us, total = 503.945 us, Queueing time: mean = 26.034 ms, max = 825.882 ms, min = 21.510 us, total = 2.265 s [state-dump] ClientConnection.async_read.ProcessMessage - 66 total (0 active), Execution time: mean = 1.106 ms, total = 72.984 ms, Queueing time: mean = 46.098 us, max = 626.320 us, min = 4.173 us, total = 3.042 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 41 total (1 active), Execution time: mean = 6.478 us, total = 265.615 us, Queueing time: mean = 46.901 us, max = 131.679 us, min = 17.556 us, total = 1.923 ms [state-dump] NodeManager.deadline_timer.record_metrics - 24 total (1 active), Execution time: mean = 482.159 us, total = 11.572 ms, Queueing time: mean = 278.141 us, max = 1.157 ms, min = 14.980 us, total = 6.675 ms [state-dump] NodeManager.GcsCheckAlive - 24 total (1 active), Execution time: mean = 216.618 us, total = 5.199 ms, Queueing time: mean = 529.935 us, max = 1.495 ms, min = 90.262 us, total = 12.718 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 24 total (0 active), Execution time: mean = 1.090 ms, total = 26.150 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 24 total (0 active), Execution time: mean = 38.156 us, total = 915.749 us, Queueing time: mean = 80.243 us, max = 156.435 us, min = 20.254 us, total = 1.926 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 12 total (1 active), Execution time: mean = 1.444 ms, total = 17.324 ms, Queueing time: mean = 37.148 us, max = 58.652 us, min = 11.663 us, total = 445.782 us [state-dump] RaySyncer.BroadcastMessage - 4 total (0 active), Execution time: mean = 160.869 us, total = 643.474 us, Queueing time: mean = 517.250 ns, max = 734.000 ns, min = 208.000 ns, total = 2.069 us [state-dump] - 4 total (0 active), Execution time: mean = 852.500 ns, total = 3.410 us, Queueing time: mean = 29.399 us, max = 38.157 us, min = 19.153 us, total = 117.598 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 467.288 ms, total = 934.575 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 2 total (0 active), Execution time: mean = 676.317 us, total = 1.353 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 2 total (0 active), Execution time: mean = 99.670 us, total = 199.340 us, Queueing time: mean = 29.024 us, max = 29.917 us, min = 28.131 us, total = 58.048 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 2 total (0 active), Execution time: mean = 372.122 us, total = 744.244 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 2 total (0 active), Execution time: mean = 30.619 us, total = 61.237 us, Queueing time: mean = 82.947 us, max = 104.884 us, min = 61.010 us, total = 165.894 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 2 total (1 active, 1 running), Execution time: mean = 1.433 ms, total = 2.867 ms, Queueing time: mean = 28.391 us, max = 56.782 us, min = 56.782 us, total = 56.782 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 2 total (0 active), Execution time: mean = 138.240 us, total = 276.481 us, Queueing time: mean = 99.016 us, max = 179.535 us, min = 18.498 us, total = 198.033 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 184.095 us, total = 184.095 us, Queueing time: mean = 20.320 us, max = 20.320 us, min = 20.320 us, total = 20.320 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 05:15:16,196 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:15:16,220 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{accelerator_type:A40: 10000, GPU: 20000, node:__internal_head__: 10000, memory: 869061529600000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, CPU: 200000}}, "available": {object_store_memory: 21474836480000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000, CPU: 200000, GPU: 20000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 16011 total (35 active) [state-dump] Queueing time: mean = 194.501 us, max = 825.882 ms, min = 171.000 ns, total = 3.114 s [state-dump] Execution time: mean = 196.293 us, total = 3.143 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 3780 total (0 active), Execution time: mean = 381.194 us, total = 1.441 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 3780 total (0 active), Execution time: mean = 26.041 us, total = 98.436 ms, Queueing time: mean = 69.611 us, max = 23.460 ms, min = 4.270 us, total = 263.129 ms [state-dump] RaySyncer.OnDemandBroadcasting - 1799 total (1 active), Execution time: mean = 9.290 us, total = 16.713 ms, Queueing time: mean = 54.985 us, max = 478.555 us, min = 9.690 us, total = 98.918 ms [state-dump] NodeManager.CheckGC - 1799 total (1 active), Execution time: mean = 3.031 us, total = 5.453 ms, Queueing time: mean = 59.984 us, max = 479.298 us, min = 8.858 us, total = 107.911 ms [state-dump] ObjectManager.UpdateAvailableMemory - 1798 total (0 active), Execution time: mean = 4.172 us, total = 7.500 ms, Queueing time: mean = 63.411 us, max = 443.107 us, min = 4.310 us, total = 114.014 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 900 total (1 active), Execution time: mean = 14.387 us, total = 12.948 ms, Queueing time: mean = 51.570 us, max = 1.134 ms, min = 8.817 us, total = 46.413 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 719 total (1 active), Execution time: mean = 420.999 us, total = 302.698 ms, Queueing time: mean = 52.638 us, max = 1.167 ms, min = 10.409 us, total = 37.847 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 181 total (1 active), Execution time: mean = 10.749 us, total = 1.946 ms, Queueing time: mean = 41.551 us, max = 146.330 us, min = 12.092 us, total = 7.521 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 180 total (1 active), Execution time: mean = 6.193 us, total = 1.115 ms, Queueing time: mean = 159.331 us, max = 1.601 ms, min = 9.369 us, total = 28.680 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 180 total (1 active), Execution time: mean = 2.843 us, total = 511.682 us, Queueing time: mean = 161.660 us, max = 1.601 ms, min = 9.965 us, total = 29.099 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 179 total (0 active), Execution time: mean = 482.167 us, total = 86.308 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 179 total (0 active), Execution time: mean = 88.760 us, total = 15.888 ms, Queueing time: mean = 69.109 us, max = 200.506 us, min = 15.809 us, total = 12.370 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 87 total (21 active), Execution time: mean = 5.792 us, total = 503.945 us, Queueing time: mean = 26.034 ms, max = 825.882 ms, min = 21.510 us, total = 2.265 s [state-dump] ClientConnection.async_read.ProcessMessage - 66 total (0 active), Execution time: mean = 1.106 ms, total = 72.984 ms, Queueing time: mean = 46.098 us, max = 626.320 us, min = 4.173 us, total = 3.042 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 61 total (1 active), Execution time: mean = 6.487 us, total = 395.708 us, Queueing time: mean = 46.293 us, max = 131.679 us, min = 17.556 us, total = 2.824 ms [state-dump] NodeManager.deadline_timer.record_metrics - 36 total (1 active), Execution time: mean = 494.827 us, total = 17.814 ms, Queueing time: mean = 308.546 us, max = 1.157 ms, min = 14.980 us, total = 11.108 ms [state-dump] NodeManager.GcsCheckAlive - 36 total (1 active), Execution time: mean = 223.426 us, total = 8.043 ms, Queueing time: mean = 571.357 us, max = 1.495 ms, min = 14.834 us, total = 20.569 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 36 total (0 active), Execution time: mean = 39.263 us, total = 1.413 ms, Queueing time: mean = 73.417 us, max = 156.435 us, min = 19.471 us, total = 2.643 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 36 total (0 active), Execution time: mean = 1.126 ms, total = 40.531 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManager.deadline_timer.debug_state_dump - 18 total (1 active), Execution time: mean = 1.528 ms, total = 27.509 ms, Queueing time: mean = 36.074 us, max = 70.518 us, min = 11.663 us, total = 649.340 us [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] RaySyncer.BroadcastMessage - 4 total (0 active), Execution time: mean = 160.869 us, total = 643.474 us, Queueing time: mean = 517.250 ns, max = 734.000 ns, min = 208.000 ns, total = 2.069 us [state-dump] - 4 total (0 active), Execution time: mean = 852.500 ns, total = 3.410 us, Queueing time: mean = 29.399 us, max = 38.157 us, min = 19.153 us, total = 117.598 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 3 total (1 active, 1 running), Execution time: mean = 1.904 ms, total = 5.713 ms, Queueing time: mean = 24.702 us, max = 56.782 us, min = 17.325 us, total = 74.107 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 467.288 ms, total = 934.575 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 2 total (0 active), Execution time: mean = 99.670 us, total = 199.340 us, Queueing time: mean = 29.024 us, max = 29.917 us, min = 28.131 us, total = 58.048 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 2 total (0 active), Execution time: mean = 676.317 us, total = 1.353 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 2 total (0 active), Execution time: mean = 372.122 us, total = 744.244 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 2 total (0 active), Execution time: mean = 30.619 us, total = 61.237 us, Queueing time: mean = 82.947 us, max = 104.884 us, min = 61.010 us, total = 165.894 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 2 total (0 active), Execution time: mean = 138.240 us, total = 276.481 us, Queueing time: mean = 99.016 us, max = 179.535 us, min = 18.498 us, total = 198.033 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 184.095 us, total = 184.095 us, Queueing time: mean = 20.320 us, max = 20.320 us, min = 20.320 us, total = 20.320 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:16:16,196 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:16:16,223 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{accelerator_type:A40: 10000, GPU: 20000, node:__internal_head__: 10000, memory: 869061529600000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, CPU: 200000}}, "available": {object_store_memory: 21474836480000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000, CPU: 200000, GPU: 20000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 21245 total (35 active) [state-dump] Queueing time: mean = 164.480 us, max = 825.882 ms, min = -0.000 s, total = 3.494 s [state-dump] Execution time: mean = 190.730 us, total = 4.052 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 5040 total (0 active), Execution time: mean = 414.790 us, total = 2.091 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 5040 total (0 active), Execution time: mean = 27.990 us, total = 141.071 ms, Queueing time: mean = 77.203 us, max = 23.460 ms, min = 4.270 us, total = 389.101 ms [state-dump] RaySyncer.OnDemandBroadcasting - 2399 total (1 active), Execution time: mean = 9.792 us, total = 23.492 ms, Queueing time: mean = 63.604 us, max = 478.555 us, min = -0.000 s, total = 152.586 ms [state-dump] NodeManager.CheckGC - 2399 total (1 active), Execution time: mean = 3.095 us, total = 7.424 ms, Queueing time: mean = 69.066 us, max = 479.298 us, min = 8.858 us, total = 165.689 ms [state-dump] ObjectManager.UpdateAvailableMemory - 2398 total (0 active), Execution time: mean = 4.510 us, total = 10.816 ms, Queueing time: mean = 70.505 us, max = 474.214 us, min = 4.310 us, total = 169.071 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1200 total (1 active), Execution time: mean = 15.328 us, total = 18.393 ms, Queueing time: mean = 57.621 us, max = 1.134 ms, min = 8.817 us, total = 69.146 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 959 total (1 active), Execution time: mean = 428.773 us, total = 411.193 ms, Queueing time: mean = 58.429 us, max = 1.167 ms, min = 10.409 us, total = 56.033 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 240 total (1 active), Execution time: mean = 11.966 us, total = 2.872 ms, Queueing time: mean = 47.610 us, max = 146.330 us, min = 12.092 us, total = 11.426 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 240 total (1 active), Execution time: mean = 3.024 us, total = 725.654 us, Queueing time: mean = 168.443 us, max = 2.015 ms, min = 9.965 us, total = 40.426 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 240 total (1 active), Execution time: mean = 6.763 us, total = 1.623 ms, Queueing time: mean = 165.867 us, max = 2.012 ms, min = 9.369 us, total = 39.808 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 239 total (0 active), Execution time: mean = 524.014 us, total = 125.239 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 239 total (0 active), Execution time: mean = 91.394 us, total = 21.843 ms, Queueing time: mean = 80.395 us, max = 901.407 us, min = 15.163 us, total = 19.214 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 87 total (21 active), Execution time: mean = 5.792 us, total = 503.945 us, Queueing time: mean = 26.034 ms, max = 825.882 ms, min = 21.510 us, total = 2.265 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 81 total (1 active), Execution time: mean = 7.217 us, total = 584.572 us, Queueing time: mean = 51.356 us, max = 151.908 us, min = 17.556 us, total = 4.160 ms [state-dump] ClientConnection.async_read.ProcessMessage - 66 total (0 active), Execution time: mean = 1.106 ms, total = 72.984 ms, Queueing time: mean = 46.098 us, max = 626.320 us, min = 4.173 us, total = 3.042 ms [state-dump] NodeManager.deadline_timer.record_metrics - 48 total (1 active), Execution time: mean = 541.563 us, total = 25.995 ms, Queueing time: mean = 299.050 us, max = 1.243 ms, min = 14.980 us, total = 14.354 ms [state-dump] NodeManager.GcsCheckAlive - 48 total (1 active), Execution time: mean = 247.687 us, total = 11.889 ms, Queueing time: mean = 588.145 us, max = 1.880 ms, min = 14.834 us, total = 28.231 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 48 total (0 active), Execution time: mean = 42.358 us, total = 2.033 ms, Queueing time: mean = 75.565 us, max = 188.533 us, min = 19.471 us, total = 3.627 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 48 total (0 active), Execution time: mean = 1.210 ms, total = 58.103 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 24 total (1 active), Execution time: mean = 1.612 ms, total = 38.681 ms, Queueing time: mean = 41.260 us, max = 78.862 us, min = 11.663 us, total = 990.233 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] RaySyncer.BroadcastMessage - 4 total (0 active), Execution time: mean = 160.869 us, total = 643.474 us, Queueing time: mean = 517.250 ns, max = 734.000 ns, min = 208.000 ns, total = 2.069 us [state-dump] - 4 total (0 active), Execution time: mean = 852.500 ns, total = 3.410 us, Queueing time: mean = 29.399 us, max = 38.157 us, min = 19.153 us, total = 117.598 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 4 total (1 active, 1 running), Execution time: mean = 2.134 ms, total = 8.535 ms, Queueing time: mean = 31.158 us, max = 56.782 us, min = 17.325 us, total = 124.633 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 467.288 ms, total = 934.575 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 2 total (0 active), Execution time: mean = 99.670 us, total = 199.340 us, Queueing time: mean = 29.024 us, max = 29.917 us, min = 28.131 us, total = 58.048 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 2 total (0 active), Execution time: mean = 676.317 us, total = 1.353 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 2 total (0 active), Execution time: mean = 372.122 us, total = 744.244 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 2 total (0 active), Execution time: mean = 30.619 us, total = 61.237 us, Queueing time: mean = 82.947 us, max = 104.884 us, min = 61.010 us, total = 165.894 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 2 total (0 active), Execution time: mean = 138.240 us, total = 276.481 us, Queueing time: mean = 99.016 us, max = 179.535 us, min = 18.498 us, total = 198.033 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 184.095 us, total = 184.095 us, Queueing time: mean = 20.320 us, max = 20.320 us, min = 20.320 us, total = 20.320 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 05:17:16,197 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:17:16,226 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{accelerator_type:A40: 10000, GPU: 20000, node:__internal_head__: 10000, memory: 869061529600000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, CPU: 200000}}, "available": {object_store_memory: 21474836480000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000, CPU: 200000, GPU: 20000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 26476 total (35 active) [state-dump] Queueing time: mean = 147.564 us, max = 825.882 ms, min = -0.000 s, total = 3.907 s [state-dump] Execution time: mean = 188.131 us, total = 4.981 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 6300 total (0 active), Execution time: mean = 437.520 us, total = 2.756 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 6300 total (0 active), Execution time: mean = 30.272 us, total = 190.715 ms, Queueing time: mean = 84.092 us, max = 23.460 ms, min = 4.270 us, total = 529.782 ms [state-dump] RaySyncer.OnDemandBroadcasting - 2998 total (1 active), Execution time: mean = 10.023 us, total = 30.048 ms, Queueing time: mean = 68.806 us, max = 579.823 us, min = -0.000 s, total = 206.281 ms [state-dump] NodeManager.CheckGC - 2998 total (1 active), Execution time: mean = 3.145 us, total = 9.427 ms, Queueing time: mean = 74.510 us, max = 582.221 us, min = 8.858 us, total = 223.380 ms [state-dump] ObjectManager.UpdateAvailableMemory - 2997 total (0 active), Execution time: mean = 4.825 us, total = 14.461 ms, Queueing time: mean = 79.125 us, max = 498.532 us, min = 4.310 us, total = 237.137 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1500 total (1 active), Execution time: mean = 16.378 us, total = 24.567 ms, Queueing time: mean = 63.150 us, max = 1.134 ms, min = 8.817 us, total = 94.725 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1198 total (1 active), Execution time: mean = 431.893 us, total = 517.408 ms, Queueing time: mean = 63.335 us, max = 1.167 ms, min = 10.409 us, total = 75.875 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 300 total (1 active), Execution time: mean = 12.595 us, total = 3.779 ms, Queueing time: mean = 59.071 us, max = 2.173 ms, min = 12.092 us, total = 17.721 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 300 total (1 active), Execution time: mean = 3.068 us, total = 920.344 us, Queueing time: mean = 169.152 us, max = 2.015 ms, min = 9.965 us, total = 50.746 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 300 total (1 active), Execution time: mean = 7.419 us, total = 2.226 ms, Queueing time: mean = 166.220 us, max = 2.012 ms, min = 9.369 us, total = 49.866 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 299 total (0 active), Execution time: mean = 545.556 us, total = 163.121 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 299 total (0 active), Execution time: mean = 92.896 us, total = 27.776 ms, Queueing time: mean = 85.642 us, max = 901.407 us, min = 15.163 us, total = 25.607 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 101 total (1 active), Execution time: mean = 7.501 us, total = 757.585 us, Queueing time: mean = 55.202 us, max = 204.789 us, min = 17.556 us, total = 5.575 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 87 total (21 active), Execution time: mean = 5.792 us, total = 503.945 us, Queueing time: mean = 26.034 ms, max = 825.882 ms, min = 21.510 us, total = 2.265 s [state-dump] ClientConnection.async_read.ProcessMessage - 66 total (0 active), Execution time: mean = 1.106 ms, total = 72.984 ms, Queueing time: mean = 46.098 us, max = 626.320 us, min = 4.173 us, total = 3.042 ms [state-dump] NodeManager.deadline_timer.record_metrics - 60 total (1 active), Execution time: mean = 552.628 us, total = 33.158 ms, Queueing time: mean = 295.948 us, max = 1.243 ms, min = 14.980 us, total = 17.757 ms [state-dump] NodeManager.GcsCheckAlive - 60 total (1 active), Execution time: mean = 253.118 us, total = 15.187 ms, Queueing time: mean = 590.893 us, max = 1.880 ms, min = 14.834 us, total = 35.454 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 60 total (0 active), Execution time: mean = 45.147 us, total = 2.709 ms, Queueing time: mean = 84.296 us, max = 190.402 us, min = 19.471 us, total = 5.058 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 60 total (0 active), Execution time: mean = 1.278 ms, total = 76.677 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 30 total (1 active), Execution time: mean = 1.626 ms, total = 48.785 ms, Queueing time: mean = 45.688 us, max = 88.076 us, min = 11.663 us, total = 1.371 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 5 total (1 active, 1 running), Execution time: mean = 2.371 ms, total = 11.855 ms, Queueing time: mean = 38.262 us, max = 66.678 us, min = 17.325 us, total = 191.311 us [state-dump] RaySyncer.BroadcastMessage - 4 total (0 active), Execution time: mean = 160.869 us, total = 643.474 us, Queueing time: mean = 517.250 ns, max = 734.000 ns, min = 208.000 ns, total = 2.069 us [state-dump] - 4 total (0 active), Execution time: mean = 852.500 ns, total = 3.410 us, Queueing time: mean = 29.399 us, max = 38.157 us, min = 19.153 us, total = 117.598 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 467.288 ms, total = 934.575 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 2 total (0 active), Execution time: mean = 99.670 us, total = 199.340 us, Queueing time: mean = 29.024 us, max = 29.917 us, min = 28.131 us, total = 58.048 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 2 total (0 active), Execution time: mean = 676.317 us, total = 1.353 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 2 total (0 active), Execution time: mean = 372.122 us, total = 744.244 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 2 total (0 active), Execution time: mean = 30.619 us, total = 61.237 us, Queueing time: mean = 82.947 us, max = 104.884 us, min = 61.010 us, total = 165.894 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 2 total (0 active), Execution time: mean = 138.240 us, total = 276.481 us, Queueing time: mean = 99.016 us, max = 179.535 us, min = 18.498 us, total = 198.033 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 184.095 us, total = 184.095 us, Queueing time: mean = 20.320 us, max = 20.320 us, min = 20.320 us, total = 20.320 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 05:18:16,197 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:18:16,229 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{accelerator_type:A40: 10000, GPU: 20000, node:__internal_head__: 10000, memory: 869061529600000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, CPU: 200000}}, "available": {object_store_memory: 21474836480000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000, CPU: 200000, GPU: 20000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 31707 total (35 active) [state-dump] Queueing time: mean = 136.584 us, max = 825.882 ms, min = -0.000 s, total = 4.331 s [state-dump] Execution time: mean = 186.955 us, total = 5.928 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 7558 total (0 active), Execution time: mean = 454.913 us, total = 3.438 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 7558 total (0 active), Execution time: mean = 31.842 us, total = 240.663 ms, Queueing time: mean = 89.051 us, max = 23.460 ms, min = 4.270 us, total = 673.050 ms [state-dump] RaySyncer.OnDemandBroadcasting - 3598 total (1 active), Execution time: mean = 10.239 us, total = 36.840 ms, Queueing time: mean = 73.708 us, max = 579.823 us, min = -0.000 s, total = 265.201 ms [state-dump] NodeManager.CheckGC - 3598 total (1 active), Execution time: mean = 3.181 us, total = 11.444 ms, Queueing time: mean = 79.633 us, max = 582.221 us, min = 8.858 us, total = 286.518 ms [state-dump] ObjectManager.UpdateAvailableMemory - 3597 total (0 active), Execution time: mean = 5.028 us, total = 18.087 ms, Queueing time: mean = 84.149 us, max = 498.532 us, min = 4.310 us, total = 302.684 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1800 total (1 active), Execution time: mean = 17.051 us, total = 30.691 ms, Queueing time: mean = 67.174 us, max = 1.134 ms, min = 8.817 us, total = 120.913 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1438 total (1 active), Execution time: mean = 435.191 us, total = 625.804 ms, Queueing time: mean = 66.457 us, max = 1.167 ms, min = 10.409 us, total = 95.565 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 360 total (1 active), Execution time: mean = 7.814 us, total = 2.813 ms, Queueing time: mean = 167.732 us, max = 2.012 ms, min = 9.369 us, total = 60.384 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 360 total (1 active), Execution time: mean = 13.028 us, total = 4.690 ms, Queueing time: mean = 61.023 us, max = 2.173 ms, min = 12.092 us, total = 21.968 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 360 total (1 active), Execution time: mean = 3.105 us, total = 1.118 ms, Queueing time: mean = 170.875 us, max = 2.015 ms, min = 9.965 us, total = 61.515 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 359 total (0 active), Execution time: mean = 94.530 us, total = 33.936 ms, Queueing time: mean = 90.301 us, max = 901.407 us, min = 15.163 us, total = 32.418 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 359 total (0 active), Execution time: mean = 559.873 us, total = 200.995 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 121 total (1 active), Execution time: mean = 7.689 us, total = 930.353 us, Queueing time: mean = 57.264 us, max = 204.789 us, min = 17.556 us, total = 6.929 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 87 total (21 active), Execution time: mean = 5.792 us, total = 503.945 us, Queueing time: mean = 26.034 ms, max = 825.882 ms, min = 21.510 us, total = 2.265 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 72 total (0 active), Execution time: mean = 46.387 us, total = 3.340 ms, Queueing time: mean = 88.554 us, max = 190.402 us, min = 19.471 us, total = 6.376 ms [state-dump] NodeManager.GcsCheckAlive - 72 total (1 active), Execution time: mean = 258.848 us, total = 18.637 ms, Queueing time: mean = 595.744 us, max = 1.880 ms, min = 14.834 us, total = 42.894 ms [state-dump] NodeManager.deadline_timer.record_metrics - 72 total (1 active), Execution time: mean = 556.547 us, total = 40.071 ms, Queueing time: mean = 301.918 us, max = 1.243 ms, min = 14.980 us, total = 21.738 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 72 total (0 active), Execution time: mean = 1.314 ms, total = 94.627 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessage - 66 total (0 active), Execution time: mean = 1.106 ms, total = 72.984 ms, Queueing time: mean = 46.098 us, max = 626.320 us, min = 4.173 us, total = 3.042 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 36 total (1 active), Execution time: mean = 1.650 ms, total = 59.389 ms, Queueing time: mean = 51.810 us, max = 174.696 us, min = 11.663 us, total = 1.865 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 6 total (1 active, 1 running), Execution time: mean = 2.413 ms, total = 14.478 ms, Queueing time: mean = 42.462 us, max = 66.678 us, min = 17.325 us, total = 254.775 us [state-dump] RaySyncer.BroadcastMessage - 4 total (0 active), Execution time: mean = 160.869 us, total = 643.474 us, Queueing time: mean = 517.250 ns, max = 734.000 ns, min = 208.000 ns, total = 2.069 us [state-dump] - 4 total (0 active), Execution time: mean = 852.500 ns, total = 3.410 us, Queueing time: mean = 29.399 us, max = 38.157 us, min = 19.153 us, total = 117.598 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 467.288 ms, total = 934.575 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 2 total (0 active), Execution time: mean = 676.317 us, total = 1.353 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 2 total (0 active), Execution time: mean = 99.670 us, total = 199.340 us, Queueing time: mean = 29.024 us, max = 29.917 us, min = 28.131 us, total = 58.048 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 2 total (0 active), Execution time: mean = 372.122 us, total = 744.244 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 2 total (0 active), Execution time: mean = 30.619 us, total = 61.237 us, Queueing time: mean = 82.947 us, max = 104.884 us, min = 61.010 us, total = 165.894 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 2 total (0 active), Execution time: mean = 138.240 us, total = 276.481 us, Queueing time: mean = 99.016 us, max = 179.535 us, min = 18.498 us, total = 198.033 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 184.095 us, total = 184.095 us, Queueing time: mean = 20.320 us, max = 20.320 us, min = 20.320 us, total = 20.320 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:19:16,197 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:19:16,232 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [190000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 0 Cluster resources: node id: -1615174027500056430{"total":{object_store_memory: 21474836480000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000}}, "available": {memory: 869061529600000, GPU: 20000, CPU: 190000, node:__internal_head__: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 1 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] - (language=PYTHON actor_or_task=process_csv_file pid=16856 worker_id=bdb2b9463e9e49360a1f1d5bfaa73ef46a03b5efa5248a9360109312): {CPU: 10000} [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] - {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=__main__, class_name=, function_name=process_csv_file, function_hash=8d91a724246e46c4bc28c071a498df7c} scheduling_strategy=default_scheduling_strategy { [state-dump] } [state-dump] resource_set={CPU : 1, }}: 1/20 [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 19 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 36954 total (35 active) [state-dump] Queueing time: mean = 22.792 ms, max = 418.747 s, min = -0.000 s, total = 842.242 s [state-dump] Execution time: mean = 186.096 us, total = 6.877 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 8818 total (0 active), Execution time: mean = 32.575 us, total = 287.243 ms, Queueing time: mean = 92.518 us, max = 23.460 ms, min = 4.270 us, total = 815.821 ms [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 8818 total (0 active), Execution time: mean = 467.805 us, total = 4.125 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 4197 total (1 active), Execution time: mean = 10.481 us, total = 43.988 ms, Queueing time: mean = 76.743 us, max = 579.823 us, min = -0.000 s, total = 322.088 ms [state-dump] NodeManager.CheckGC - 4197 total (1 active), Execution time: mean = 3.198 us, total = 13.423 ms, Queueing time: mean = 82.919 us, max = 582.221 us, min = 8.858 us, total = 348.010 ms [state-dump] ObjectManager.UpdateAvailableMemory - 4196 total (0 active), Execution time: mean = 5.144 us, total = 21.582 ms, Queueing time: mean = 88.181 us, max = 498.532 us, min = 3.684 us, total = 370.010 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 2100 total (1 active), Execution time: mean = 17.327 us, total = 36.387 ms, Queueing time: mean = 68.364 us, max = 1.134 ms, min = 8.817 us, total = 143.564 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1677 total (1 active), Execution time: mean = 436.219 us, total = 731.540 ms, Queueing time: mean = 67.627 us, max = 1.167 ms, min = 7.589 us, total = 113.410 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 420 total (1 active), Execution time: mean = 7.984 us, total = 3.353 ms, Queueing time: mean = 168.877 us, max = 2.012 ms, min = 9.369 us, total = 70.928 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 420 total (1 active), Execution time: mean = 13.474 us, total = 5.659 ms, Queueing time: mean = 63.061 us, max = 2.173 ms, min = 12.092 us, total = 26.486 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 420 total (1 active), Execution time: mean = 3.132 us, total = 1.315 ms, Queueing time: mean = 172.091 us, max = 2.015 ms, min = 9.965 us, total = 72.278 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 419 total (0 active), Execution time: mean = 573.044 us, total = 240.105 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 419 total (0 active), Execution time: mean = 95.429 us, total = 39.985 ms, Queueing time: mean = 94.860 us, max = 901.407 us, min = 15.163 us, total = 39.746 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 141 total (1 active), Execution time: mean = 7.954 us, total = 1.121 ms, Queueing time: mean = 61.079 us, max = 204.789 us, min = 17.556 us, total = 8.612 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 89 total (21 active), Execution time: mean = 5.968 us, total = 531.178 us, Queueing time: mean = 9.435 s, max = 418.747 s, min = 21.510 us, total = 839.758 s [state-dump] NodeManager.deadline_timer.record_metrics - 84 total (1 active), Execution time: mean = 556.614 us, total = 46.756 ms, Queueing time: mean = 310.396 us, max = 1.243 ms, min = 14.980 us, total = 26.073 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 84 total (0 active), Execution time: mean = 1.327 ms, total = 111.445 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 84 total (1 active), Execution time: mean = 261.084 us, total = 21.931 ms, Queueing time: mean = 602.505 us, max = 1.880 ms, min = 14.834 us, total = 50.610 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 84 total (0 active), Execution time: mean = 47.427 us, total = 3.984 ms, Queueing time: mean = 93.622 us, max = 190.402 us, min = 19.471 us, total = 7.864 ms [state-dump] ClientConnection.async_read.ProcessMessage - 68 total (0 active), Execution time: mean = 1.074 ms, total = 73.013 ms, Queueing time: mean = 45.201 us, max = 626.320 us, min = 4.173 us, total = 3.074 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 42 total (1 active), Execution time: mean = 1.667 ms, total = 70.011 ms, Queueing time: mean = 54.582 us, max = 174.696 us, min = 11.663 us, total = 2.292 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 7 total (1 active, 1 running), Execution time: mean = 2.473 ms, total = 17.309 ms, Queueing time: mean = 47.705 us, max = 79.160 us, min = 17.325 us, total = 333.935 us [state-dump] RaySyncer.BroadcastMessage - 6 total (0 active), Execution time: mean = 186.342 us, total = 1.118 ms, Queueing time: mean = 565.833 ns, max = 734.000 ns, min = 208.000 ns, total = 3.395 us [state-dump] - 6 total (0 active), Execution time: mean = 1.009 us, total = 6.053 us, Queueing time: mean = 75.494 us, max = 184.577 us, min = 19.153 us, total = 452.964 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 4 total (0 active), Execution time: mean = 813.336 us, total = 3.253 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 4 total (0 active), Execution time: mean = 61.175 us, total = 244.698 us, Queueing time: mean = 53.790 us, max = 104.884 us, min = 16.930 us, total = 215.159 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 4 total (0 active), Execution time: mean = 146.219 us, total = 584.875 us, Queueing time: mean = 85.053 us, max = 179.535 us, min = 18.498 us, total = 340.211 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 3 total (0 active), Execution time: mean = 488.124 us, total = 1.464 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 3 total (0 active), Execution time: mean = 99.143 us, total = 297.429 us, Queueing time: mean = 56.194 us, max = 110.534 us, min = 28.131 us, total = 168.582 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 467.288 ms, total = 934.575 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 184.095 us, total = 184.095 us, Queueing time: mean = 20.320 us, max = 20.320 us, min = 20.320 us, total = 20.320 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:20:16,198 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:20:16,235 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, accelerator_type:A40: 10000, node:__internal_head__: 10000, CPU: 200000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, memory: 869061529600000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:__internal_head__: 10000, CPU: 200000, object_store_memory: 21474836480000, memory: 869061529600000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 42193 total (35 active) [state-dump] Queueing time: mean = 29.890 ms, max = 418.747 s, min = -0.000 s, total = 1261.131 s [state-dump] Execution time: mean = 184.127 us, total = 7.769 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 10077 total (0 active), Execution time: mean = 32.761 us, total = 330.132 ms, Queueing time: mean = 93.627 us, max = 23.460 ms, min = 4.270 us, total = 943.478 ms [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 10077 total (0 active), Execution time: mean = 472.559 us, total = 4.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 4797 total (1 active), Execution time: mean = 10.454 us, total = 50.147 ms, Queueing time: mean = 77.093 us, max = 579.823 us, min = -0.000 s, total = 369.817 ms [state-dump] NodeManager.CheckGC - 4797 total (1 active), Execution time: mean = 3.201 us, total = 15.357 ms, Queueing time: mean = 83.263 us, max = 582.221 us, min = 8.858 us, total = 399.411 ms [state-dump] ObjectManager.UpdateAvailableMemory - 4796 total (0 active), Execution time: mean = 5.189 us, total = 24.887 ms, Queueing time: mean = 89.329 us, max = 498.532 us, min = 3.684 us, total = 428.423 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 2400 total (1 active), Execution time: mean = 17.437 us, total = 41.850 ms, Queueing time: mean = 68.405 us, max = 1.134 ms, min = 7.744 us, total = 164.171 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1917 total (1 active), Execution time: mean = 438.384 us, total = 840.383 ms, Queueing time: mean = 68.498 us, max = 1.167 ms, min = 7.589 us, total = 131.311 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 480 total (1 active), Execution time: mean = 13.766 us, total = 6.608 ms, Queueing time: mean = 63.268 us, max = 2.173 ms, min = 12.092 us, total = 30.368 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 480 total (1 active), Execution time: mean = 8.043 us, total = 3.860 ms, Queueing time: mean = 170.048 us, max = 2.012 ms, min = 9.369 us, total = 81.623 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 480 total (1 active), Execution time: mean = 3.201 us, total = 1.537 ms, Queueing time: mean = 173.209 us, max = 2.015 ms, min = 9.965 us, total = 83.140 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 479 total (0 active), Execution time: mean = 578.126 us, total = 276.922 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 479 total (0 active), Execution time: mean = 96.860 us, total = 46.396 ms, Queueing time: mean = 95.916 us, max = 901.407 us, min = 15.163 us, total = 45.944 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 161 total (1 active), Execution time: mean = 7.901 us, total = 1.272 ms, Queueing time: mean = 61.917 us, max = 204.789 us, min = 17.556 us, total = 9.969 ms [state-dump] NodeManager.GcsCheckAlive - 96 total (1 active), Execution time: mean = 264.901 us, total = 25.431 ms, Queueing time: mean = 605.812 us, max = 1.880 ms, min = 14.834 us, total = 58.158 ms [state-dump] NodeManager.deadline_timer.record_metrics - 96 total (1 active), Execution time: mean = 551.145 us, total = 52.910 ms, Queueing time: mean = 322.285 us, max = 1.243 ms, min = 14.980 us, total = 30.939 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 96 total (0 active), Execution time: mean = 47.983 us, total = 4.606 ms, Queueing time: mean = 95.531 us, max = 202.139 us, min = 19.471 us, total = 9.171 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 96 total (0 active), Execution time: mean = 1.331 ms, total = 127.781 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 90 total (21 active), Execution time: mean = 6.093 us, total = 548.325 us, Queueing time: mean = 13.981 s, max = 418.747 s, min = 21.510 us, total = 1258.276 s [state-dump] ClientConnection.async_read.ProcessMessage - 69 total (0 active), Execution time: mean = 1.059 ms, total = 73.046 ms, Queueing time: mean = 45.678 us, max = 626.320 us, min = 4.173 us, total = 3.152 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 48 total (1 active), Execution time: mean = 1.684 ms, total = 80.823 ms, Queueing time: mean = 55.154 us, max = 174.696 us, min = 11.663 us, total = 2.647 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 8 total (1 active, 1 running), Execution time: mean = 2.543 ms, total = 20.346 ms, Queueing time: mean = 51.973 us, max = 81.847 us, min = 17.325 us, total = 415.782 us [state-dump] - 7 total (0 active), Execution time: mean = 1.078 us, total = 7.546 us, Queueing time: mean = 86.433 us, max = 184.577 us, min = 19.153 us, total = 605.030 us [state-dump] RaySyncer.BroadcastMessage - 7 total (0 active), Execution time: mean = 190.874 us, total = 1.336 ms, Queueing time: mean = 597.000 ns, max = 784.000 ns, min = 208.000 ns, total = 4.179 us [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 4 total (0 active), Execution time: mean = 104.192 us, total = 416.769 us, Queueing time: mean = 53.008 us, max = 110.534 us, min = 28.131 us, total = 212.031 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 4 total (0 active), Execution time: mean = 813.336 us, total = 3.253 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 4 total (0 active), Execution time: mean = 490.280 us, total = 1.961 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 4 total (0 active), Execution time: mean = 61.175 us, total = 244.698 us, Queueing time: mean = 53.790 us, max = 104.884 us, min = 16.930 us, total = 215.159 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 4 total (0 active), Execution time: mean = 146.219 us, total = 584.875 us, Queueing time: mean = 85.053 us, max = 179.535 us, min = 18.498 us, total = 340.211 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 467.288 ms, total = 934.575 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 184.095 us, total = 184.095 us, Queueing time: mean = 20.320 us, max = 20.320 us, min = 20.320 us, total = 20.320 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:21:16,198 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:21:16,238 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, node:192.168.0.2: 10000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000, memory: 869061529600000, accelerator_type:A40: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, memory: 869061529600000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 47442 total (35 active) [state-dump] Queueing time: mean = 32.057 ms, max = 418.747 s, min = -0.000 s, total = 1520.857 s [state-dump] Execution time: mean = 183.964 us, total = 8.728 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 11335 total (0 active), Execution time: mean = 33.385 us, total = 378.422 ms, Queueing time: mean = 96.395 us, max = 23.460 ms, min = 4.270 us, total = 1.093 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 11335 total (0 active), Execution time: mean = 480.715 us, total = 5.449 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 5396 total (1 active), Execution time: mean = 10.673 us, total = 57.593 ms, Queueing time: mean = 78.708 us, max = 579.823 us, min = -0.000 s, total = 424.710 ms [state-dump] NodeManager.CheckGC - 5396 total (1 active), Execution time: mean = 3.222 us, total = 17.387 ms, Queueing time: mean = 85.090 us, max = 583.097 us, min = 8.858 us, total = 459.144 ms [state-dump] ObjectManager.UpdateAvailableMemory - 5395 total (0 active), Execution time: mean = 5.303 us, total = 28.611 ms, Queueing time: mean = 92.118 us, max = 498.532 us, min = 3.684 us, total = 496.976 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 2700 total (1 active), Execution time: mean = 17.702 us, total = 47.797 ms, Queueing time: mean = 69.636 us, max = 1.134 ms, min = 7.744 us, total = 188.018 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 2156 total (1 active), Execution time: mean = 441.283 us, total = 951.406 ms, Queueing time: mean = 69.469 us, max = 1.167 ms, min = 7.589 us, total = 149.774 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 540 total (1 active), Execution time: mean = 14.062 us, total = 7.593 ms, Queueing time: mean = 63.521 us, max = 2.173 ms, min = 12.092 us, total = 34.302 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 540 total (1 active), Execution time: mean = 8.111 us, total = 4.380 ms, Queueing time: mean = 171.164 us, max = 2.012 ms, min = 9.369 us, total = 92.428 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 540 total (1 active), Execution time: mean = 3.234 us, total = 1.746 ms, Queueing time: mean = 174.344 us, max = 2.015 ms, min = 9.965 us, total = 94.146 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 539 total (0 active), Execution time: mean = 584.390 us, total = 314.986 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 539 total (0 active), Execution time: mean = 97.881 us, total = 52.758 ms, Queueing time: mean = 98.754 us, max = 901.407 us, min = 15.163 us, total = 53.229 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 181 total (1 active), Execution time: mean = 7.935 us, total = 1.436 ms, Queueing time: mean = 65.753 us, max = 360.057 us, min = 17.556 us, total = 11.901 ms [state-dump] NodeManager.GcsCheckAlive - 108 total (1 active), Execution time: mean = 267.683 us, total = 28.910 ms, Queueing time: mean = 610.932 us, max = 1.880 ms, min = 14.834 us, total = 65.981 ms [state-dump] NodeManager.deadline_timer.record_metrics - 108 total (1 active), Execution time: mean = 555.375 us, total = 59.980 ms, Queueing time: mean = 326.370 us, max = 1.288 ms, min = 14.980 us, total = 35.248 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 108 total (0 active), Execution time: mean = 48.164 us, total = 5.202 ms, Queueing time: mean = 97.991 us, max = 202.139 us, min = 19.471 us, total = 10.583 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 108 total (0 active), Execution time: mean = 1.345 ms, total = 145.252 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 93 total (21 active), Execution time: mean = 6.371 us, total = 592.517 us, Queueing time: mean = 16.318 s, max = 418.747 s, min = 21.510 us, total = 1517.577 s [state-dump] ClientConnection.async_read.ProcessMessage - 72 total (0 active), Execution time: mean = 1.015 ms, total = 73.109 ms, Queueing time: mean = 44.460 us, max = 626.320 us, min = 4.173 us, total = 3.201 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 54 total (1 active), Execution time: mean = 1.696 ms, total = 91.582 ms, Queueing time: mean = 57.059 us, max = 174.696 us, min = 11.663 us, total = 3.081 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] - 10 total (0 active), Execution time: mean = 1.070 us, total = 10.698 us, Queueing time: mean = 85.990 us, max = 184.577 us, min = 19.153 us, total = 859.902 us [state-dump] RaySyncer.BroadcastMessage - 10 total (0 active), Execution time: mean = 211.039 us, total = 2.110 ms, Queueing time: mean = 672.800 ns, max = 874.000 ns, min = 208.000 ns, total = 6.728 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 9 total (1 active, 1 running), Execution time: mean = 2.556 ms, total = 23.006 ms, Queueing time: mean = 52.316 us, max = 81.847 us, min = 17.325 us, total = 470.840 us [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 6 total (0 active), Execution time: mean = 117.637 us, total = 705.820 us, Queueing time: mean = 64.615 us, max = 120.086 us, min = 28.131 us, total = 387.689 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 6 total (0 active), Execution time: mean = 870.269 us, total = 5.222 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 6 total (0 active), Execution time: mean = 546.049 us, total = 3.276 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 6 total (0 active), Execution time: mean = 65.382 us, total = 392.291 us, Queueing time: mean = 46.406 us, max = 104.884 us, min = 16.930 us, total = 278.434 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 6 total (0 active), Execution time: mean = 171.040 us, total = 1.026 ms, Queueing time: mean = 100.279 us, max = 179.535 us, min = 18.498 us, total = 601.673 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 467.288 ms, total = 934.575 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 184.095 us, total = 184.095 us, Queueing time: mean = 20.320 us, max = 20.320 us, min = 20.320 us, total = 20.320 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:22:16,199 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:22:16,240 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, memory: 869061529600000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 52689 total (35 active) [state-dump] Queueing time: mean = 33.124 ms, max = 418.747 s, min = -0.000 s, total = 1745.279 s [state-dump] Execution time: mean = 11.530 ms, total = 607.484 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 12591 total (0 active), Execution time: mean = 33.829 us, total = 425.943 ms, Queueing time: mean = 98.032 us, max = 23.460 ms, min = 4.270 us, total = 1.234 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 12591 total (0 active), Execution time: mean = 485.985 us, total = 6.119 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 5995 total (1 active), Execution time: mean = 10.863 us, total = 65.124 ms, Queueing time: mean = 79.912 us, max = 1.249 ms, min = -0.000 s, total = 479.070 ms [state-dump] NodeManager.CheckGC - 5995 total (1 active), Execution time: mean = 3.236 us, total = 19.399 ms, Queueing time: mean = 86.478 us, max = 1.325 ms, min = 8.858 us, total = 518.439 ms [state-dump] ObjectManager.UpdateAvailableMemory - 5994 total (0 active), Execution time: mean = 5.389 us, total = 32.299 ms, Queueing time: mean = 93.821 us, max = 498.532 us, min = 3.684 us, total = 562.363 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 2999 total (1 active), Execution time: mean = 17.820 us, total = 53.441 ms, Queueing time: mean = 69.741 us, max = 1.134 ms, min = 7.744 us, total = 209.152 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 2396 total (1 active), Execution time: mean = 443.203 us, total = 1.062 s, Queueing time: mean = 69.517 us, max = 1.167 ms, min = 7.589 us, total = 166.562 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 600 total (1 active), Execution time: mean = 14.155 us, total = 8.493 ms, Queueing time: mean = 64.953 us, max = 2.173 ms, min = 12.092 us, total = 38.972 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 600 total (1 active), Execution time: mean = 8.174 us, total = 4.904 ms, Queueing time: mean = 171.542 us, max = 2.012 ms, min = 9.369 us, total = 102.925 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 600 total (1 active), Execution time: mean = 3.254 us, total = 1.952 ms, Queueing time: mean = 174.745 us, max = 2.015 ms, min = 9.965 us, total = 104.847 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 599 total (0 active), Execution time: mean = 588.925 us, total = 352.766 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 599 total (0 active), Execution time: mean = 98.500 us, total = 59.001 ms, Queueing time: mean = 100.457 us, max = 901.407 us, min = 15.163 us, total = 60.174 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 201 total (1 active), Execution time: mean = 7.962 us, total = 1.600 ms, Queueing time: mean = 67.102 us, max = 360.057 us, min = 17.556 us, total = 13.488 ms [state-dump] NodeManager.GcsCheckAlive - 120 total (1 active), Execution time: mean = 269.644 us, total = 32.357 ms, Queueing time: mean = 609.304 us, max = 1.880 ms, min = 14.834 us, total = 73.116 ms [state-dump] NodeManager.deadline_timer.record_metrics - 120 total (1 active), Execution time: mean = 553.131 us, total = 66.376 ms, Queueing time: mean = 328.386 us, max = 1.288 ms, min = 10.161 us, total = 39.406 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 120 total (0 active), Execution time: mean = 48.412 us, total = 5.809 ms, Queueing time: mean = 99.036 us, max = 202.139 us, min = 19.471 us, total = 11.884 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 120 total (0 active), Execution time: mean = 1.356 ms, total = 162.684 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 96 total (21 active), Execution time: mean = 6.636 us, total = 637.100 us, Queueing time: mean = 18.142 s, max = 418.747 s, min = 21.510 us, total = 1741.592 s [state-dump] ClientConnection.async_read.ProcessMessage - 75 total (0 active), Execution time: mean = 975.859 us, total = 73.189 ms, Queueing time: mean = 43.339 us, max = 626.320 us, min = 4.173 us, total = 3.250 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 60 total (1 active), Execution time: mean = 1.694 ms, total = 101.657 ms, Queueing time: mean = 59.087 us, max = 174.696 us, min = 11.175 us, total = 3.545 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] - 13 total (0 active), Execution time: mean = 1.077 us, total = 13.996 us, Queueing time: mean = 102.781 us, max = 184.577 us, min = 19.153 us, total = 1.336 ms [state-dump] RaySyncer.BroadcastMessage - 13 total (0 active), Execution time: mean = 224.799 us, total = 2.922 ms, Queueing time: mean = 709.231 ns, max = 937.000 ns, min = 208.000 ns, total = 9.220 us [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 10 total (1 active, 1 running), Execution time: mean = 2.582 ms, total = 25.822 ms, Queueing time: mean = 50.458 us, max = 81.847 us, min = 17.325 us, total = 504.583 us [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 8 total (0 active), Execution time: mean = 125.623 us, total = 1.005 ms, Queueing time: mean = 68.011 us, max = 135.953 us, min = 20.450 us, total = 544.092 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 8 total (0 active), Execution time: mean = 962.700 us, total = 7.702 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 8 total (0 active), Execution time: mean = 571.840 us, total = 4.575 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 8 total (0 active), Execution time: mean = 60.985 us, total = 487.880 us, Queueing time: mean = 61.291 us, max = 118.312 us, min = 16.930 us, total = 490.330 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 8 total (0 active), Execution time: mean = 183.604 us, total = 1.469 ms, Queueing time: mean = 120.684 us, max = 240.570 us, min = 18.498 us, total = 965.476 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.584 s, total = 598.751 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 302.530 us, total = 605.060 us, Queueing time: mean = 102.561 us, max = 184.802 us, min = 20.320 us, total = 205.122 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:23:16,199 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:23:16,243 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, memory: 869061529600000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 57919 total (35 active) [state-dump] Queueing time: mean = 30.140 ms, max = 418.747 s, min = -0.000 s, total = 1745.688 s [state-dump] Execution time: mean = 10.504 ms, total = 608.410 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 13849 total (0 active), Execution time: mean = 34.111 us, total = 472.399 ms, Queueing time: mean = 99.825 us, max = 23.460 ms, min = 4.270 us, total = 1.382 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 13849 total (0 active), Execution time: mean = 490.290 us, total = 6.790 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 6595 total (1 active), Execution time: mean = 10.786 us, total = 71.134 ms, Queueing time: mean = 80.702 us, max = 3.167 ms, min = -0.000 s, total = 532.233 ms [state-dump] NodeManager.CheckGC - 6595 total (1 active), Execution time: mean = 3.236 us, total = 21.343 ms, Queueing time: mean = 87.204 us, max = 3.183 ms, min = 6.205 us, total = 575.114 ms [state-dump] ObjectManager.UpdateAvailableMemory - 6594 total (0 active), Execution time: mean = 5.437 us, total = 35.853 ms, Queueing time: mean = 95.355 us, max = 498.532 us, min = 3.684 us, total = 628.774 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 3299 total (1 active), Execution time: mean = 17.830 us, total = 58.821 ms, Queueing time: mean = 69.683 us, max = 1.134 ms, min = 7.744 us, total = 229.885 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 2635 total (1 active), Execution time: mean = 443.390 us, total = 1.168 s, Queueing time: mean = 69.679 us, max = 1.167 ms, min = 7.589 us, total = 183.605 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 660 total (1 active), Execution time: mean = 14.194 us, total = 9.368 ms, Queueing time: mean = 65.368 us, max = 2.173 ms, min = 12.092 us, total = 43.143 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 660 total (1 active), Execution time: mean = 8.207 us, total = 5.417 ms, Queueing time: mean = 170.955 us, max = 2.012 ms, min = 9.369 us, total = 112.830 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 660 total (1 active), Execution time: mean = 3.256 us, total = 2.149 ms, Queueing time: mean = 174.164 us, max = 2.015 ms, min = 9.965 us, total = 114.948 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 659 total (0 active), Execution time: mean = 592.439 us, total = 390.417 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 659 total (0 active), Execution time: mean = 99.303 us, total = 65.441 ms, Queueing time: mean = 101.560 us, max = 901.407 us, min = 15.163 us, total = 66.928 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 221 total (1 active), Execution time: mean = 7.924 us, total = 1.751 ms, Queueing time: mean = 71.327 us, max = 496.804 us, min = 17.556 us, total = 15.763 ms [state-dump] NodeManager.GcsCheckAlive - 132 total (1 active), Execution time: mean = 271.348 us, total = 35.818 ms, Queueing time: mean = 608.072 us, max = 1.880 ms, min = 14.834 us, total = 80.266 ms [state-dump] NodeManager.deadline_timer.record_metrics - 132 total (1 active), Execution time: mean = 548.692 us, total = 72.427 ms, Queueing time: mean = 333.506 us, max = 1.288 ms, min = 10.161 us, total = 44.023 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 132 total (0 active), Execution time: mean = 48.777 us, total = 6.439 ms, Queueing time: mean = 99.596 us, max = 202.139 us, min = 19.471 us, total = 13.147 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 132 total (0 active), Execution time: mean = 1.358 ms, total = 179.243 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 96 total (21 active), Execution time: mean = 6.636 us, total = 637.100 us, Queueing time: mean = 18.142 s, max = 418.747 s, min = 21.510 us, total = 1741.592 s [state-dump] ClientConnection.async_read.ProcessMessage - 75 total (0 active), Execution time: mean = 975.859 us, total = 73.189 ms, Queueing time: mean = 43.339 us, max = 626.320 us, min = 4.173 us, total = 3.250 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 66 total (1 active), Execution time: mean = 1.698 ms, total = 112.061 ms, Queueing time: mean = 59.432 us, max = 174.696 us, min = 11.175 us, total = 3.923 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] - 13 total (0 active), Execution time: mean = 1.077 us, total = 13.996 us, Queueing time: mean = 102.781 us, max = 184.577 us, min = 19.153 us, total = 1.336 ms [state-dump] RaySyncer.BroadcastMessage - 13 total (0 active), Execution time: mean = 224.799 us, total = 2.922 ms, Queueing time: mean = 709.231 ns, max = 937.000 ns, min = 208.000 ns, total = 9.220 us [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 11 total (1 active, 1 running), Execution time: mean = 2.587 ms, total = 28.456 ms, Queueing time: mean = 51.426 us, max = 81.847 us, min = 17.325 us, total = 565.686 us [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 8 total (0 active), Execution time: mean = 125.623 us, total = 1.005 ms, Queueing time: mean = 68.011 us, max = 135.953 us, min = 20.450 us, total = 544.092 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 8 total (0 active), Execution time: mean = 962.700 us, total = 7.702 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 8 total (0 active), Execution time: mean = 571.840 us, total = 4.575 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 8 total (0 active), Execution time: mean = 60.985 us, total = 487.880 us, Queueing time: mean = 61.291 us, max = 118.312 us, min = 16.930 us, total = 490.330 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 8 total (0 active), Execution time: mean = 183.604 us, total = 1.469 ms, Queueing time: mean = 120.684 us, max = 240.570 us, min = 18.498 us, total = 965.476 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.584 s, total = 598.751 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 302.530 us, total = 605.060 us, Queueing time: mean = 102.561 us, max = 184.802 us, min = 20.320 us, total = 205.122 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 05:24:16,199 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:24:16,246 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, memory: 869061529600000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 63151 total (35 active) [state-dump] Queueing time: mean = 27.649 ms, max = 418.747 s, min = -0.000 s, total = 1746.071 s [state-dump] Execution time: mean = 9.648 ms, total = 609.287 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 15109 total (0 active), Execution time: mean = 34.026 us, total = 514.094 ms, Queueing time: mean = 100.508 us, max = 23.460 ms, min = 3.524 us, total = 1.519 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 15109 total (0 active), Execution time: mean = 491.552 us, total = 7.427 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 7194 total (1 active), Execution time: mean = 10.649 us, total = 76.608 ms, Queueing time: mean = 80.733 us, max = 3.167 ms, min = -0.000 s, total = 580.795 ms [state-dump] NodeManager.CheckGC - 7194 total (1 active), Execution time: mean = 3.216 us, total = 23.137 ms, Queueing time: mean = 87.130 us, max = 3.183 ms, min = 6.205 us, total = 626.814 ms [state-dump] ObjectManager.UpdateAvailableMemory - 7193 total (0 active), Execution time: mean = 5.422 us, total = 38.998 ms, Queueing time: mean = 96.402 us, max = 517.950 us, min = 3.684 us, total = 693.418 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 3599 total (1 active), Execution time: mean = 17.636 us, total = 63.471 ms, Queueing time: mean = 69.567 us, max = 1.134 ms, min = 7.744 us, total = 250.372 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 2875 total (1 active), Execution time: mean = 442.069 us, total = 1.271 s, Queueing time: mean = 69.626 us, max = 1.438 ms, min = 7.589 us, total = 200.175 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 720 total (1 active), Execution time: mean = 14.201 us, total = 10.225 ms, Queueing time: mean = 65.423 us, max = 2.173 ms, min = 8.343 us, total = 47.105 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 720 total (1 active), Execution time: mean = 8.152 us, total = 5.869 ms, Queueing time: mean = 170.052 us, max = 2.012 ms, min = 9.369 us, total = 122.438 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 720 total (1 active), Execution time: mean = 3.280 us, total = 2.361 ms, Queueing time: mean = 173.193 us, max = 2.015 ms, min = 9.965 us, total = 124.699 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 719 total (0 active), Execution time: mean = 592.082 us, total = 425.707 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 719 total (0 active), Execution time: mean = 98.956 us, total = 71.149 ms, Queueing time: mean = 102.954 us, max = 901.407 us, min = 15.163 us, total = 74.024 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 241 total (1 active), Execution time: mean = 7.883 us, total = 1.900 ms, Queueing time: mean = 70.777 us, max = 496.804 us, min = 17.556 us, total = 17.057 ms [state-dump] NodeManager.GcsCheckAlive - 144 total (1 active), Execution time: mean = 271.698 us, total = 39.125 ms, Queueing time: mean = 603.014 us, max = 1.880 ms, min = 14.834 us, total = 86.834 ms [state-dump] NodeManager.deadline_timer.record_metrics - 144 total (1 active), Execution time: mean = 541.247 us, total = 77.940 ms, Queueing time: mean = 335.605 us, max = 1.288 ms, min = 10.161 us, total = 48.327 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 144 total (0 active), Execution time: mean = 48.657 us, total = 7.007 ms, Queueing time: mean = 101.820 us, max = 202.139 us, min = 19.471 us, total = 14.662 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 144 total (0 active), Execution time: mean = 1.355 ms, total = 195.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 96 total (21 active), Execution time: mean = 6.636 us, total = 637.100 us, Queueing time: mean = 18.142 s, max = 418.747 s, min = 21.510 us, total = 1741.592 s [state-dump] ClientConnection.async_read.ProcessMessage - 75 total (0 active), Execution time: mean = 975.859 us, total = 73.189 ms, Queueing time: mean = 43.339 us, max = 626.320 us, min = 4.173 us, total = 3.250 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 72 total (1 active), Execution time: mean = 1.687 ms, total = 121.483 ms, Queueing time: mean = 60.068 us, max = 174.696 us, min = 11.175 us, total = 4.325 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] - 13 total (0 active), Execution time: mean = 1.077 us, total = 13.996 us, Queueing time: mean = 102.781 us, max = 184.577 us, min = 19.153 us, total = 1.336 ms [state-dump] RaySyncer.BroadcastMessage - 13 total (0 active), Execution time: mean = 224.799 us, total = 2.922 ms, Queueing time: mean = 709.231 ns, max = 937.000 ns, min = 208.000 ns, total = 9.220 us [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 12 total (1 active, 1 running), Execution time: mean = 2.615 ms, total = 31.384 ms, Queueing time: mean = 53.817 us, max = 81.847 us, min = 17.325 us, total = 645.806 us [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 8 total (0 active), Execution time: mean = 125.623 us, total = 1.005 ms, Queueing time: mean = 68.011 us, max = 135.953 us, min = 20.450 us, total = 544.092 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 8 total (0 active), Execution time: mean = 962.700 us, total = 7.702 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 8 total (0 active), Execution time: mean = 571.840 us, total = 4.575 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 8 total (0 active), Execution time: mean = 60.985 us, total = 487.880 us, Queueing time: mean = 61.291 us, max = 118.312 us, min = 16.930 us, total = 490.330 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 8 total (0 active), Execution time: mean = 183.604 us, total = 1.469 ms, Queueing time: mean = 120.684 us, max = 240.570 us, min = 18.498 us, total = 965.476 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.584 s, total = 598.751 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 302.530 us, total = 605.060 us, Queueing time: mean = 102.561 us, max = 184.802 us, min = 20.320 us, total = 205.122 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:25:16,200 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:25:16,249 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, memory: 869061529600000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 68365 total (35 active) [state-dump] Queueing time: mean = 25.546 ms, max = 418.747 s, min = -0.000 s, total = 1746.461 s [state-dump] Execution time: mean = 8.925 ms, total = 610.188 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 16359 total (0 active), Execution time: mean = 34.132 us, total = 558.361 ms, Queueing time: mean = 101.500 us, max = 23.460 ms, min = 3.524 us, total = 1.660 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 16359 total (0 active), Execution time: mean = 493.762 us, total = 8.077 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 7794 total (1 active), Execution time: mean = 10.587 us, total = 82.514 ms, Queueing time: mean = 81.005 us, max = 3.167 ms, min = -0.000 s, total = 631.352 ms [state-dump] NodeManager.CheckGC - 7794 total (1 active), Execution time: mean = 3.203 us, total = 24.964 ms, Queueing time: mean = 87.363 us, max = 3.183 ms, min = 6.205 us, total = 680.905 ms [state-dump] ObjectManager.UpdateAvailableMemory - 7793 total (0 active), Execution time: mean = 5.440 us, total = 42.394 ms, Queueing time: mean = 96.861 us, max = 517.950 us, min = 3.639 us, total = 754.836 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 3899 total (1 active), Execution time: mean = 17.559 us, total = 68.461 ms, Queueing time: mean = 69.498 us, max = 1.134 ms, min = 7.744 us, total = 270.974 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 3114 total (1 active), Execution time: mean = 442.435 us, total = 1.378 s, Queueing time: mean = 69.455 us, max = 1.438 ms, min = 7.589 us, total = 216.283 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 780 total (1 active), Execution time: mean = 14.272 us, total = 11.132 ms, Queueing time: mean = 65.142 us, max = 2.173 ms, min = 8.343 us, total = 50.811 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 780 total (1 active), Execution time: mean = 8.143 us, total = 6.352 ms, Queueing time: mean = 169.986 us, max = 2.012 ms, min = 9.369 us, total = 132.589 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 780 total (1 active), Execution time: mean = 3.280 us, total = 2.558 ms, Queueing time: mean = 173.118 us, max = 2.015 ms, min = 9.965 us, total = 135.032 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 779 total (0 active), Execution time: mean = 592.761 us, total = 461.761 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 779 total (0 active), Execution time: mean = 99.022 us, total = 77.138 ms, Queueing time: mean = 104.002 us, max = 901.407 us, min = 15.163 us, total = 81.017 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 261 total (1 active), Execution time: mean = 7.911 us, total = 2.065 ms, Queueing time: mean = 70.195 us, max = 496.804 us, min = 17.556 us, total = 18.321 ms [state-dump] NodeManager.GcsCheckAlive - 156 total (1 active), Execution time: mean = 271.208 us, total = 42.308 ms, Queueing time: mean = 603.893 us, max = 1.880 ms, min = 14.834 us, total = 94.207 ms [state-dump] NodeManager.deadline_timer.record_metrics - 156 total (1 active), Execution time: mean = 540.715 us, total = 84.351 ms, Queueing time: mean = 336.515 us, max = 1.288 ms, min = 10.161 us, total = 52.496 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 156 total (0 active), Execution time: mean = 48.774 us, total = 7.609 ms, Queueing time: mean = 100.965 us, max = 202.139 us, min = 19.471 us, total = 15.751 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 156 total (0 active), Execution time: mean = 1.363 ms, total = 212.635 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 96 total (21 active), Execution time: mean = 6.636 us, total = 637.100 us, Queueing time: mean = 18.142 s, max = 418.747 s, min = 21.510 us, total = 1741.592 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 78 total (1 active), Execution time: mean = 1.686 ms, total = 131.470 ms, Queueing time: mean = 59.468 us, max = 174.696 us, min = 11.175 us, total = 4.639 ms [state-dump] ClientConnection.async_read.ProcessMessage - 75 total (0 active), Execution time: mean = 975.859 us, total = 73.189 ms, Queueing time: mean = 43.339 us, max = 626.320 us, min = 4.173 us, total = 3.250 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] - 13 total (0 active), Execution time: mean = 1.077 us, total = 13.996 us, Queueing time: mean = 102.781 us, max = 184.577 us, min = 19.153 us, total = 1.336 ms [state-dump] RaySyncer.BroadcastMessage - 13 total (0 active), Execution time: mean = 224.799 us, total = 2.922 ms, Queueing time: mean = 709.231 ns, max = 937.000 ns, min = 208.000 ns, total = 9.220 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 13 total (1 active, 1 running), Execution time: mean = 2.610 ms, total = 33.929 ms, Queueing time: mean = 54.259 us, max = 81.847 us, min = 17.325 us, total = 705.371 us [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 8 total (0 active), Execution time: mean = 125.623 us, total = 1.005 ms, Queueing time: mean = 68.011 us, max = 135.953 us, min = 20.450 us, total = 544.092 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 8 total (0 active), Execution time: mean = 962.700 us, total = 7.702 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 8 total (0 active), Execution time: mean = 571.840 us, total = 4.575 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 8 total (0 active), Execution time: mean = 60.985 us, total = 487.880 us, Queueing time: mean = 61.291 us, max = 118.312 us, min = 16.930 us, total = 490.330 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 8 total (0 active), Execution time: mean = 183.604 us, total = 1.469 ms, Queueing time: mean = 120.684 us, max = 240.570 us, min = 18.498 us, total = 965.476 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.584 s, total = 598.751 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 302.530 us, total = 605.060 us, Queueing time: mean = 102.561 us, max = 184.802 us, min = 20.320 us, total = 205.122 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:26:16,200 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:26:16,253 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{accelerator_type:A40: 10000, node:__internal_head__: 10000, GPU: 20000, memory: 869061529600000, CPU: 200000, object_store_memory: 21474836480000, node:192.168.0.2: 10000}}, "available": {accelerator_type:A40: 10000, object_store_memory: 21474836480000, GPU: 20000, node:192.168.0.2: 10000, node:__internal_head__: 10000, CPU: 200000, memory: 869061529600000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 73619 total (35 active) [state-dump] Queueing time: mean = 32.766 ms, max = 418.747 s, min = -0.000 s, total = 2412.166 s [state-dump] Execution time: mean = 8.301 ms, total = 611.098 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 17619 total (0 active), Execution time: mean = 34.208 us, total = 602.710 ms, Queueing time: mean = 102.035 us, max = 23.460 ms, min = 3.524 us, total = 1.798 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 17619 total (0 active), Execution time: mean = 495.519 us, total = 8.731 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 8393 total (1 active), Execution time: mean = 10.653 us, total = 89.409 ms, Queueing time: mean = 81.055 us, max = 3.167 ms, min = -0.000 s, total = 680.295 ms [state-dump] NodeManager.CheckGC - 8393 total (1 active), Execution time: mean = 3.198 us, total = 26.843 ms, Queueing time: mean = 87.488 us, max = 3.183 ms, min = 6.205 us, total = 734.283 ms [state-dump] ObjectManager.UpdateAvailableMemory - 8392 total (0 active), Execution time: mean = 5.463 us, total = 45.848 ms, Queueing time: mean = 97.303 us, max = 517.950 us, min = 3.639 us, total = 816.565 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 4199 total (1 active), Execution time: mean = 17.499 us, total = 73.479 ms, Queueing time: mean = 69.226 us, max = 1.134 ms, min = 7.744 us, total = 290.679 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 3354 total (1 active), Execution time: mean = 442.311 us, total = 1.484 s, Queueing time: mean = 69.616 us, max = 1.438 ms, min = 7.589 us, total = 233.493 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 840 total (1 active), Execution time: mean = 14.379 us, total = 12.079 ms, Queueing time: mean = 66.631 us, max = 2.173 ms, min = 8.343 us, total = 55.970 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 840 total (1 active), Execution time: mean = 8.130 us, total = 6.829 ms, Queueing time: mean = 169.991 us, max = 2.012 ms, min = 8.522 us, total = 142.792 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 840 total (1 active), Execution time: mean = 3.300 us, total = 2.772 ms, Queueing time: mean = 173.095 us, max = 2.015 ms, min = 5.979 us, total = 145.400 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 839 total (0 active), Execution time: mean = 594.502 us, total = 498.787 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 839 total (0 active), Execution time: mean = 99.060 us, total = 83.112 ms, Queueing time: mean = 104.053 us, max = 901.407 us, min = 15.163 us, total = 87.301 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 281 total (1 active), Execution time: mean = 7.965 us, total = 2.238 ms, Queueing time: mean = 70.407 us, max = 496.804 us, min = 17.556 us, total = 19.784 ms [state-dump] NodeManager.GcsCheckAlive - 168 total (1 active), Execution time: mean = 271.341 us, total = 45.585 ms, Queueing time: mean = 604.416 us, max = 1.880 ms, min = 7.846 us, total = 101.542 ms [state-dump] NodeManager.deadline_timer.record_metrics - 168 total (1 active), Execution time: mean = 534.779 us, total = 89.843 ms, Queueing time: mean = 343.267 us, max = 1.306 ms, min = 10.161 us, total = 57.669 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 168 total (0 active), Execution time: mean = 48.728 us, total = 8.186 ms, Queueing time: mean = 99.743 us, max = 202.139 us, min = 15.929 us, total = 16.757 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 168 total (0 active), Execution time: mean = 1.366 ms, total = 229.531 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 99 total (21 active), Execution time: mean = 6.877 us, total = 680.824 us, Queueing time: mean = 24.312 s, max = 418.747 s, min = 21.510 us, total = 2406.910 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 84 total (1 active), Execution time: mean = 1.693 ms, total = 142.228 ms, Queueing time: mean = 59.683 us, max = 174.696 us, min = 11.175 us, total = 5.013 ms [state-dump] ClientConnection.async_read.ProcessMessage - 78 total (0 active), Execution time: mean = 938.968 us, total = 73.239 ms, Queueing time: mean = 43.734 us, max = 626.320 us, min = 4.173 us, total = 3.411 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] - 16 total (0 active), Execution time: mean = 999.938 ns, total = 15.999 us, Queueing time: mean = 113.624 us, max = 324.279 us, min = 19.153 us, total = 1.818 ms [state-dump] RaySyncer.BroadcastMessage - 16 total (0 active), Execution time: mean = 224.788 us, total = 3.597 ms, Queueing time: mean = 701.375 ns, max = 937.000 ns, min = 208.000 ns, total = 11.222 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 14 total (1 active, 1 running), Execution time: mean = 2.635 ms, total = 36.889 ms, Queueing time: mean = 54.893 us, max = 81.847 us, min = 17.325 us, total = 768.498 us [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 124.290 us, total = 1.243 ms, Queueing time: mean = 70.153 us, max = 135.953 us, min = 20.450 us, total = 701.533 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 991.008 us, total = 9.910 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 561.646 us, total = 5.616 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 56.946 us, total = 569.464 us, Queueing time: mean = 66.609 us, max = 118.312 us, min = 16.930 us, total = 666.086 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 179.631 us, total = 1.796 ms, Queueing time: mean = 134.233 us, max = 272.659 us, min = 18.498 us, total = 1.342 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.584 s, total = 598.751 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 302.530 us, total = 605.060 us, Queueing time: mean = 102.561 us, max = 184.802 us, min = 20.320 us, total = 205.122 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 05:27:16,200 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:27:16,256 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [190000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 0 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {GPU: 20000, node:192.168.0.2: 10000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 190000, memory: 869061529600000, accelerator_type:A40: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 1 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] - (language=PYTHON actor_or_task=process_csv_file pid=16856 worker_id=bdb2b9463e9e49360a1f1d5bfaa73ef46a03b5efa5248a9360109312): {CPU: 10000} [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] - {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=__main__, class_name=, function_name=process_csv_file, function_hash=8d91a724246e46c4bc28c071a498df7c} scheduling_strategy=default_scheduling_strategy { [state-dump] } [state-dump] resource_set={CPU : 1, }}: 1/20 [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 19 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 78870 total (35 active) [state-dump] Queueing time: mean = 31.958 ms, max = 418.747 s, min = -0.000 s, total = 2520.503 s [state-dump] Execution time: mean = 7.760 ms, total = 612.038 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 18879 total (0 active), Execution time: mean = 34.453 us, total = 650.440 ms, Queueing time: mean = 102.680 us, max = 23.460 ms, min = 3.524 us, total = 1.939 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 18879 total (0 active), Execution time: mean = 497.974 us, total = 9.401 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 8993 total (1 active), Execution time: mean = 10.734 us, total = 96.530 ms, Queueing time: mean = 81.436 us, max = 3.167 ms, min = -0.000 s, total = 732.352 ms [state-dump] NodeManager.CheckGC - 8993 total (1 active), Execution time: mean = 3.208 us, total = 28.852 ms, Queueing time: mean = 87.944 us, max = 3.183 ms, min = 6.205 us, total = 790.883 ms [state-dump] ObjectManager.UpdateAvailableMemory - 8992 total (0 active), Execution time: mean = 5.510 us, total = 49.544 ms, Queueing time: mean = 98.295 us, max = 517.950 us, min = 3.639 us, total = 883.865 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 4499 total (1 active), Execution time: mean = 17.596 us, total = 79.162 ms, Queueing time: mean = 69.508 us, max = 1.134 ms, min = 5.101 us, total = 312.715 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 3593 total (1 active), Execution time: mean = 443.657 us, total = 1.594 s, Queueing time: mean = 69.746 us, max = 1.438 ms, min = 7.589 us, total = 250.597 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 900 total (1 active), Execution time: mean = 8.191 us, total = 7.372 ms, Queueing time: mean = 170.012 us, max = 2.012 ms, min = 6.711 us, total = 153.011 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 900 total (1 active), Execution time: mean = 14.516 us, total = 13.064 ms, Queueing time: mean = 67.451 us, max = 2.173 ms, min = 8.343 us, total = 60.706 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 900 total (1 active), Execution time: mean = 3.299 us, total = 2.969 ms, Queueing time: mean = 173.154 us, max = 2.015 ms, min = 4.207 us, total = 155.838 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 899 total (0 active), Execution time: mean = 596.687 us, total = 536.421 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 899 total (0 active), Execution time: mean = 100.403 us, total = 90.262 ms, Queueing time: mean = 104.732 us, max = 901.407 us, min = 15.163 us, total = 94.154 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 301 total (1 active), Execution time: mean = 8.050 us, total = 2.423 ms, Queueing time: mean = 70.641 us, max = 496.804 us, min = 17.556 us, total = 21.263 ms [state-dump] NodeManager.deadline_timer.record_metrics - 180 total (1 active), Execution time: mean = 532.565 us, total = 95.862 ms, Queueing time: mean = 345.175 us, max = 1.306 ms, min = 10.161 us, total = 62.132 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 180 total (0 active), Execution time: mean = 1.369 ms, total = 246.450 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 180 total (1 active), Execution time: mean = 272.576 us, total = 49.064 ms, Queueing time: mean = 603.476 us, max = 1.880 ms, min = 6.908 us, total = 108.626 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 180 total (0 active), Execution time: mean = 48.945 us, total = 8.810 ms, Queueing time: mean = 99.645 us, max = 202.139 us, min = 12.562 us, total = 17.936 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 101 total (21 active), Execution time: mean = 7.019 us, total = 708.907 us, Queueing time: mean = 24.899 s, max = 418.747 s, min = 21.510 us, total = 2514.843 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 90 total (1 active), Execution time: mean = 1.689 ms, total = 151.982 ms, Queueing time: mean = 59.719 us, max = 174.696 us, min = 11.175 us, total = 5.375 ms [state-dump] ClientConnection.async_read.ProcessMessage - 80 total (0 active), Execution time: mean = 915.894 us, total = 73.272 ms, Queueing time: mean = 43.031 us, max = 626.320 us, min = 4.173 us, total = 3.442 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] RaySyncer.BroadcastMessage - 18 total (0 active), Execution time: mean = 226.267 us, total = 4.073 ms, Queueing time: mean = 689.389 ns, max = 937.000 ns, min = 208.000 ns, total = 12.409 us [state-dump] - 18 total (0 active), Execution time: mean = 1.027 us, total = 18.490 us, Queueing time: mean = 115.957 us, max = 324.279 us, min = 19.153 us, total = 2.087 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 15 total (1 active, 1 running), Execution time: mean = 2.700 ms, total = 40.493 ms, Queueing time: mean = 54.632 us, max = 81.847 us, min = 17.325 us, total = 819.476 us [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 12 total (0 active), Execution time: mean = 1.070 ms, total = 12.845 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 12 total (0 active), Execution time: mean = 65.195 us, total = 782.334 us, Queueing time: mean = 71.668 us, max = 118.312 us, min = 16.930 us, total = 860.012 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 12 total (0 active), Execution time: mean = 178.918 us, total = 2.147 ms, Queueing time: mean = 148.526 us, max = 330.398 us, min = 18.498 us, total = 1.782 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 11 total (0 active), Execution time: mean = 572.735 us, total = 6.300 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 11 total (0 active), Execution time: mean = 124.382 us, total = 1.368 ms, Queueing time: mean = 74.186 us, max = 135.953 us, min = 20.450 us, total = 816.044 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.584 s, total = 598.751 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 302.530 us, total = 605.060 us, Queueing time: mean = 102.561 us, max = 184.802 us, min = 20.320 us, total = 205.122 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 3.715 us, total = 7.430 us, Queueing time: mean = 28.725 us, max = 57.450 us, min = 57.450 us, total = 57.450 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:28:16,200 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:28:16,258 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {GPU: 20000, memory: 869061529600000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 84108 total (35 active) [state-dump] Queueing time: mean = 31.219 ms, max = 418.747 s, min = -0.000 s, total = 2625.752 s [state-dump] Execution time: mean = 7.287 ms, total = 612.855 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 20139 total (0 active), Execution time: mean = 34.319 us, total = 691.146 ms, Queueing time: mean = 102.291 us, max = 23.460 ms, min = 3.524 us, total = 2.060 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 20139 total (0 active), Execution time: mean = 495.562 us, total = 9.980 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 9592 total (1 active), Execution time: mean = 10.656 us, total = 102.210 ms, Queueing time: mean = 81.192 us, max = 3.167 ms, min = -0.000 s, total = 778.791 ms [state-dump] NodeManager.CheckGC - 9592 total (1 active), Execution time: mean = 3.191 us, total = 30.610 ms, Queueing time: mean = 87.649 us, max = 3.183 ms, min = 6.205 us, total = 840.728 ms [state-dump] ObjectManager.UpdateAvailableMemory - 9591 total (0 active), Execution time: mean = 5.459 us, total = 52.358 ms, Queueing time: mean = 97.363 us, max = 517.950 us, min = 3.103 us, total = 933.809 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 4799 total (1 active), Execution time: mean = 17.466 us, total = 83.821 ms, Queueing time: mean = 69.086 us, max = 1.134 ms, min = 5.101 us, total = 331.543 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 3833 total (1 active), Execution time: mean = 442.470 us, total = 1.696 s, Queueing time: mean = 69.272 us, max = 1.438 ms, min = 6.986 us, total = 265.519 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 960 total (1 active), Execution time: mean = 14.539 us, total = 13.957 ms, Queueing time: mean = 66.981 us, max = 2.173 ms, min = 8.343 us, total = 64.302 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 960 total (1 active), Execution time: mean = 8.138 us, total = 7.813 ms, Queueing time: mean = 169.558 us, max = 2.012 ms, min = 6.711 us, total = 162.776 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 960 total (1 active), Execution time: mean = 3.288 us, total = 3.156 ms, Queueing time: mean = 172.682 us, max = 2.015 ms, min = 4.207 us, total = 165.775 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 959 total (0 active), Execution time: mean = 596.829 us, total = 572.359 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 959 total (0 active), Execution time: mean = 100.283 us, total = 96.171 ms, Queueing time: mean = 104.621 us, max = 901.407 us, min = 15.163 us, total = 100.331 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 321 total (1 active), Execution time: mean = 8.074 us, total = 2.592 ms, Queueing time: mean = 70.294 us, max = 496.804 us, min = 17.556 us, total = 22.564 ms [state-dump] NodeManager.GcsCheckAlive - 192 total (1 active), Execution time: mean = 271.246 us, total = 52.079 ms, Queueing time: mean = 602.121 us, max = 1.880 ms, min = 5.323 us, total = 115.607 ms [state-dump] NodeManager.deadline_timer.record_metrics - 192 total (1 active), Execution time: mean = 531.498 us, total = 102.048 ms, Queueing time: mean = 344.173 us, max = 1.306 ms, min = 8.783 us, total = 66.081 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 192 total (0 active), Execution time: mean = 48.752 us, total = 9.360 ms, Queueing time: mean = 99.383 us, max = 202.139 us, min = 12.562 us, total = 19.082 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 192 total (0 active), Execution time: mean = 1.356 ms, total = 260.364 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 102 total (21 active), Execution time: mean = 7.072 us, total = 721.358 us, Queueing time: mean = 25.684 s, max = 418.747 s, min = 21.510 us, total = 2619.747 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 96 total (1 active), Execution time: mean = 1.689 ms, total = 162.176 ms, Queueing time: mean = 59.440 us, max = 174.696 us, min = 11.175 us, total = 5.706 ms [state-dump] ClientConnection.async_read.ProcessMessage - 81 total (0 active), Execution time: mean = 904.906 us, total = 73.297 ms, Queueing time: mean = 42.849 us, max = 626.320 us, min = 4.173 us, total = 3.471 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] - 19 total (0 active), Execution time: mean = 1.006 us, total = 19.121 us, Queueing time: mean = 111.341 us, max = 324.279 us, min = 19.153 us, total = 2.115 ms [state-dump] RaySyncer.BroadcastMessage - 19 total (0 active), Execution time: mean = 224.502 us, total = 4.266 ms, Queueing time: mean = 681.526 ns, max = 937.000 ns, min = 208.000 ns, total = 12.949 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 16 total (1 active, 1 running), Execution time: mean = 2.711 ms, total = 43.373 ms, Queueing time: mean = 55.665 us, max = 81.847 us, min = 17.325 us, total = 890.639 us [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 12 total (0 active), Execution time: mean = 121.878 us, total = 1.463 ms, Queueing time: mean = 70.305 us, max = 135.953 us, min = 20.450 us, total = 843.656 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 12 total (0 active), Execution time: mean = 1.070 ms, total = 12.845 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 12 total (0 active), Execution time: mean = 573.932 us, total = 6.887 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 12 total (0 active), Execution time: mean = 65.195 us, total = 782.334 us, Queueing time: mean = 71.668 us, max = 118.312 us, min = 16.930 us, total = 860.012 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 12 total (0 active), Execution time: mean = 178.918 us, total = 2.147 ms, Queueing time: mean = 148.526 us, max = 330.398 us, min = 18.498 us, total = 1.782 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.584 s, total = 598.751 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 302.530 us, total = 605.060 us, Queueing time: mean = 102.561 us, max = 184.802 us, min = 20.320 us, total = 205.122 us [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 3.715 us, total = 7.430 us, Queueing time: mean = 28.725 us, max = 57.450 us, min = 57.450 us, total = 57.450 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:29:16,201 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:29:16,260 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {GPU: 20000, memory: 869061529600000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 89342 total (35 active) [state-dump] Queueing time: mean = 29.394 ms, max = 418.747 s, min = -0.000 s, total = 2626.160 s [state-dump] Execution time: mean = 6.870 ms, total = 613.778 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 21399 total (0 active), Execution time: mean = 34.483 us, total = 737.901 ms, Queueing time: mean = 103.030 us, max = 23.460 ms, min = 3.524 us, total = 2.205 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 21399 total (0 active), Execution time: mean = 497.744 us, total = 10.651 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 10192 total (1 active), Execution time: mean = 10.631 us, total = 108.352 ms, Queueing time: mean = 81.579 us, max = 3.167 ms, min = -0.000 s, total = 831.457 ms [state-dump] NodeManager.CheckGC - 10192 total (1 active), Execution time: mean = 3.187 us, total = 32.479 ms, Queueing time: mean = 88.023 us, max = 3.183 ms, min = 6.205 us, total = 897.128 ms [state-dump] ObjectManager.UpdateAvailableMemory - 10191 total (0 active), Execution time: mean = 5.485 us, total = 55.899 ms, Queueing time: mean = 98.253 us, max = 656.726 us, min = 3.103 us, total = 1.001 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 5099 total (1 active), Execution time: mean = 17.398 us, total = 88.713 ms, Queueing time: mean = 69.031 us, max = 1.134 ms, min = 5.101 us, total = 351.989 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 4072 total (1 active), Execution time: mean = 442.381 us, total = 1.801 s, Queueing time: mean = 69.472 us, max = 1.438 ms, min = 6.986 us, total = 282.892 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1020 total (1 active), Execution time: mean = 14.642 us, total = 14.935 ms, Queueing time: mean = 66.930 us, max = 2.173 ms, min = 8.343 us, total = 68.269 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1020 total (1 active), Execution time: mean = 8.183 us, total = 8.347 ms, Queueing time: mean = 170.427 us, max = 2.012 ms, min = 6.711 us, total = 173.835 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1020 total (1 active), Execution time: mean = 3.282 us, total = 3.347 ms, Queueing time: mean = 173.572 us, max = 2.015 ms, min = 4.207 us, total = 177.044 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1019 total (0 active), Execution time: mean = 598.139 us, total = 609.504 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1019 total (0 active), Execution time: mean = 100.327 us, total = 102.233 ms, Queueing time: mean = 104.896 us, max = 901.407 us, min = 15.163 us, total = 106.889 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 341 total (1 active), Execution time: mean = 8.047 us, total = 2.744 ms, Queueing time: mean = 69.680 us, max = 496.804 us, min = 17.556 us, total = 23.761 ms [state-dump] NodeManager.GcsCheckAlive - 204 total (1 active), Execution time: mean = 270.157 us, total = 55.112 ms, Queueing time: mean = 606.086 us, max = 2.157 ms, min = 5.323 us, total = 123.642 ms [state-dump] NodeManager.deadline_timer.record_metrics - 204 total (1 active), Execution time: mean = 528.154 us, total = 107.743 ms, Queueing time: mean = 350.149 us, max = 1.805 ms, min = 8.783 us, total = 71.430 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 204 total (0 active), Execution time: mean = 48.388 us, total = 9.871 ms, Queueing time: mean = 98.657 us, max = 202.139 us, min = 12.562 us, total = 20.126 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 204 total (0 active), Execution time: mean = 1.352 ms, total = 275.777 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 102 total (21 active), Execution time: mean = 7.072 us, total = 721.358 us, Queueing time: mean = 25.684 s, max = 418.747 s, min = 21.510 us, total = 2619.747 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 102 total (1 active), Execution time: mean = 1.693 ms, total = 172.666 ms, Queueing time: mean = 62.759 us, max = 174.696 us, min = 11.175 us, total = 6.401 ms [state-dump] ClientConnection.async_read.ProcessMessage - 81 total (0 active), Execution time: mean = 904.906 us, total = 73.297 ms, Queueing time: mean = 42.849 us, max = 626.320 us, min = 4.173 us, total = 3.471 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] - 19 total (0 active), Execution time: mean = 1.006 us, total = 19.121 us, Queueing time: mean = 111.341 us, max = 324.279 us, min = 19.153 us, total = 2.115 ms [state-dump] RaySyncer.BroadcastMessage - 19 total (0 active), Execution time: mean = 224.502 us, total = 4.266 ms, Queueing time: mean = 681.526 ns, max = 937.000 ns, min = 208.000 ns, total = 12.949 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 17 total (1 active, 1 running), Execution time: mean = 2.717 ms, total = 46.186 ms, Queueing time: mean = 55.589 us, max = 81.847 us, min = 17.325 us, total = 945.021 us [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 12 total (0 active), Execution time: mean = 121.878 us, total = 1.463 ms, Queueing time: mean = 70.305 us, max = 135.953 us, min = 20.450 us, total = 843.656 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 12 total (0 active), Execution time: mean = 1.070 ms, total = 12.845 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 12 total (0 active), Execution time: mean = 573.932 us, total = 6.887 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 12 total (0 active), Execution time: mean = 65.195 us, total = 782.334 us, Queueing time: mean = 71.668 us, max = 118.312 us, min = 16.930 us, total = 860.012 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 12 total (0 active), Execution time: mean = 178.918 us, total = 2.147 ms, Queueing time: mean = 148.526 us, max = 330.398 us, min = 18.498 us, total = 1.782 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.584 s, total = 598.751 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 302.530 us, total = 605.060 us, Queueing time: mean = 102.561 us, max = 184.802 us, min = 20.320 us, total = 205.122 us [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 3.715 us, total = 7.430 us, Queueing time: mean = 28.725 us, max = 57.450 us, min = 57.450 us, total = 57.450 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:30:16,201 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:30:16,263 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [190000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 0 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, node:192.168.0.2: 10000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, memory: 869061529600000, accelerator_type:A40: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:__internal_head__: 10000, CPU: 190000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, memory: 869061529600000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 1 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] - (language=PYTHON actor_or_task=process_csv_file pid=16856 worker_id=bdb2b9463e9e49360a1f1d5bfaa73ef46a03b5efa5248a9360109312): {CPU: 10000} [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] - {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=__main__, class_name=, function_name=process_csv_file, function_hash=8d91a724246e46c4bc28c071a498df7c} scheduling_strategy=default_scheduling_strategy { [state-dump] } [state-dump] resource_set={CPU : 1, }}: 1/20 [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 19 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 94590 total (35 active) [state-dump] Queueing time: mean = 31.422 ms, max = 418.747 s, min = -0.000 s, total = 2972.192 s [state-dump] Execution time: mean = 6.495 ms, total = 614.360 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 22659 total (0 active), Execution time: mean = 486.024 us, total = 11.013 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 22659 total (0 active), Execution time: mean = 33.997 us, total = 770.328 ms, Queueing time: mean = 99.611 us, max = 23.460 ms, min = 3.524 us, total = 2.257 s [state-dump] RaySyncer.OnDemandBroadcasting - 10791 total (1 active), Execution time: mean = 10.577 us, total = 114.135 ms, Queueing time: mean = 80.212 us, max = 3.167 ms, min = -0.000 s, total = 865.573 ms [state-dump] NodeManager.CheckGC - 10791 total (1 active), Execution time: mean = 3.177 us, total = 34.287 ms, Queueing time: mean = 86.626 us, max = 3.183 ms, min = 6.205 us, total = 934.786 ms [state-dump] ObjectManager.UpdateAvailableMemory - 10790 total (0 active), Execution time: mean = 5.398 us, total = 58.240 ms, Queueing time: mean = 94.968 us, max = 656.726 us, min = 3.103 us, total = 1.025 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 5399 total (1 active), Execution time: mean = 17.282 us, total = 93.307 ms, Queueing time: mean = 67.874 us, max = 1.134 ms, min = 5.101 us, total = 366.449 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 4312 total (1 active), Execution time: mean = 441.522 us, total = 1.904 s, Queueing time: mean = 67.950 us, max = 1.438 ms, min = 6.986 us, total = 293.001 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1080 total (1 active), Execution time: mean = 3.263 us, total = 3.524 ms, Queueing time: mean = 172.658 us, max = 2.015 ms, min = 4.207 us, total = 186.470 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1080 total (1 active), Execution time: mean = 14.581 us, total = 15.747 ms, Queueing time: mean = 67.845 us, max = 2.658 ms, min = 8.343 us, total = 73.273 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1080 total (1 active), Execution time: mean = 8.113 us, total = 8.762 ms, Queueing time: mean = 169.543 us, max = 2.012 ms, min = 6.711 us, total = 183.106 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1079 total (0 active), Execution time: mean = 589.057 us, total = 635.593 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1079 total (0 active), Execution time: mean = 100.879 us, total = 108.848 ms, Queueing time: mean = 102.813 us, max = 901.407 us, min = 15.163 us, total = 110.935 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 361 total (1 active), Execution time: mean = 8.078 us, total = 2.916 ms, Queueing time: mean = 68.647 us, max = 496.804 us, min = 17.556 us, total = 24.782 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 216 total (0 active), Execution time: mean = 47.889 us, total = 10.344 ms, Queueing time: mean = 95.836 us, max = 202.139 us, min = 12.562 us, total = 20.701 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 216 total (0 active), Execution time: mean = 1.332 ms, total = 287.668 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 216 total (1 active), Execution time: mean = 268.488 us, total = 57.994 ms, Queueing time: mean = 606.628 us, max = 2.157 ms, min = 5.323 us, total = 131.032 ms [state-dump] NodeManager.deadline_timer.record_metrics - 216 total (1 active), Execution time: mean = 524.047 us, total = 113.194 ms, Queueing time: mean = 352.796 us, max = 1.805 ms, min = 8.783 us, total = 76.204 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 108 total (1 active), Execution time: mean = 1.689 ms, total = 182.451 ms, Queueing time: mean = 61.101 us, max = 174.696 us, min = 11.175 us, total = 6.599 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 104 total (21 active), Execution time: mean = 7.146 us, total = 743.225 us, Queueing time: mean = 28.515 s, max = 418.747 s, min = 21.510 us, total = 2965.564 s [state-dump] ClientConnection.async_read.ProcessMessage - 83 total (0 active), Execution time: mean = 883.380 us, total = 73.321 ms, Queueing time: mean = 42.072 us, max = 626.320 us, min = 4.173 us, total = 3.492 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 21 total (0 active), Execution time: mean = 1.023 us, total = 21.481 us, Queueing time: mean = 109.155 us, max = 324.279 us, min = 19.153 us, total = 2.292 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] RaySyncer.BroadcastMessage - 21 total (0 active), Execution time: mean = 230.971 us, total = 4.850 ms, Queueing time: mean = 710.190 ns, max = 1.120 us, min = 208.000 ns, total = 14.914 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 18 total (1 active, 1 running), Execution time: mean = 2.649 ms, total = 47.681 ms, Queueing time: mean = 54.202 us, max = 81.847 us, min = 17.325 us, total = 975.631 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 13 total (0 active), Execution time: mean = 587.514 us, total = 7.638 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 13 total (0 active), Execution time: mean = 123.777 us, total = 1.609 ms, Queueing time: mean = 73.345 us, max = 135.953 us, min = 20.450 us, total = 953.485 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.584 s, total = 598.751 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 302.530 us, total = 605.060 us, Queueing time: mean = 102.561 us, max = 184.802 us, min = 20.320 us, total = 205.122 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 3.715 us, total = 7.430 us, Queueing time: mean = 28.725 us, max = 57.450 us, min = 57.450 us, total = 57.450 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:31:16,201 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:31:16,266 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [190000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 0 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, node:192.168.0.2: 10000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, memory: 869061529600000, accelerator_type:A40: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:__internal_head__: 10000, CPU: 190000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, memory: 869061529600000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 1 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] - (language=PYTHON actor_or_task=process_csv_file pid=16856 worker_id=bdb2b9463e9e49360a1f1d5bfaa73ef46a03b5efa5248a9360109312): {CPU: 10000} [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] - {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=__main__, class_name=, function_name=process_csv_file, function_hash=8d91a724246e46c4bc28c071a498df7c} scheduling_strategy=default_scheduling_strategy { [state-dump] } [state-dump] resource_set={CPU : 1, }}: 1/20 [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 19 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 99824 total (35 active) [state-dump] Queueing time: mean = 29.778 ms, max = 418.747 s, min = -0.000 s, total = 2972.516 s [state-dump] Execution time: mean = 6.162 ms, total = 615.131 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 23919 total (0 active), Execution time: mean = 482.865 us, total = 11.550 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 23919 total (0 active), Execution time: mean = 33.956 us, total = 812.203 ms, Queueing time: mean = 98.986 us, max = 23.460 ms, min = 3.300 us, total = 2.368 s [state-dump] RaySyncer.OnDemandBroadcasting - 11391 total (1 active), Execution time: mean = 10.471 us, total = 119.274 ms, Queueing time: mean = 79.756 us, max = 3.167 ms, min = -0.000 s, total = 908.500 ms [state-dump] NodeManager.CheckGC - 11391 total (1 active), Execution time: mean = 3.165 us, total = 36.055 ms, Queueing time: mean = 86.082 us, max = 3.183 ms, min = 6.205 us, total = 980.563 ms [state-dump] ObjectManager.UpdateAvailableMemory - 11390 total (0 active), Execution time: mean = 5.351 us, total = 60.950 ms, Queueing time: mean = 94.228 us, max = 656.726 us, min = 3.103 us, total = 1.073 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 5698 total (1 active), Execution time: mean = 17.146 us, total = 97.697 ms, Queueing time: mean = 67.385 us, max = 1.134 ms, min = 5.101 us, total = 383.958 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 4552 total (1 active), Execution time: mean = 440.746 us, total = 2.006 s, Queueing time: mean = 67.316 us, max = 1.438 ms, min = 6.986 us, total = 306.421 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1140 total (1 active), Execution time: mean = 3.253 us, total = 3.709 ms, Queueing time: mean = 173.248 us, max = 3.551 ms, min = 4.207 us, total = 197.503 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1140 total (1 active), Execution time: mean = 14.520 us, total = 16.552 ms, Queueing time: mean = 67.068 us, max = 2.658 ms, min = 8.343 us, total = 76.457 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1140 total (1 active), Execution time: mean = 8.083 us, total = 9.215 ms, Queueing time: mean = 170.146 us, max = 3.537 ms, min = 6.711 us, total = 193.966 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1139 total (0 active), Execution time: mean = 585.495 us, total = 666.879 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1139 total (0 active), Execution time: mean = 101.439 us, total = 115.539 ms, Queueing time: mean = 101.973 us, max = 901.407 us, min = 14.850 us, total = 116.148 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 381 total (1 active), Execution time: mean = 8.051 us, total = 3.067 ms, Queueing time: mean = 67.957 us, max = 496.804 us, min = 17.556 us, total = 25.892 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 228 total (0 active), Execution time: mean = 47.942 us, total = 10.931 ms, Queueing time: mean = 95.461 us, max = 202.139 us, min = 12.562 us, total = 21.765 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 228 total (0 active), Execution time: mean = 1.322 ms, total = 301.515 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 228 total (1 active), Execution time: mean = 267.402 us, total = 60.968 ms, Queueing time: mean = 609.475 us, max = 2.157 ms, min = 5.323 us, total = 138.960 ms [state-dump] NodeManager.deadline_timer.record_metrics - 228 total (1 active), Execution time: mean = 521.383 us, total = 118.875 ms, Queueing time: mean = 356.568 us, max = 1.805 ms, min = 8.783 us, total = 81.297 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 114 total (1 active), Execution time: mean = 1.690 ms, total = 192.609 ms, Queueing time: mean = 61.381 us, max = 174.696 us, min = 11.175 us, total = 6.997 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 104 total (21 active), Execution time: mean = 7.146 us, total = 743.225 us, Queueing time: mean = 28.515 s, max = 418.747 s, min = 21.510 us, total = 2965.564 s [state-dump] ClientConnection.async_read.ProcessMessage - 83 total (0 active), Execution time: mean = 883.380 us, total = 73.321 ms, Queueing time: mean = 42.072 us, max = 626.320 us, min = 4.173 us, total = 3.492 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 21 total (0 active), Execution time: mean = 1.023 us, total = 21.481 us, Queueing time: mean = 109.155 us, max = 324.279 us, min = 19.153 us, total = 2.292 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] RaySyncer.BroadcastMessage - 21 total (0 active), Execution time: mean = 230.971 us, total = 4.850 ms, Queueing time: mean = 710.190 ns, max = 1.120 us, min = 208.000 ns, total = 14.914 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 19 total (1 active, 1 running), Execution time: mean = 2.663 ms, total = 50.591 ms, Queueing time: mean = 53.419 us, max = 81.847 us, min = 17.325 us, total = 1.015 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 13 total (0 active), Execution time: mean = 587.514 us, total = 7.638 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 13 total (0 active), Execution time: mean = 123.777 us, total = 1.609 ms, Queueing time: mean = 73.345 us, max = 135.953 us, min = 20.450 us, total = 953.485 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.584 s, total = 598.751 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 302.530 us, total = 605.060 us, Queueing time: mean = 102.561 us, max = 184.802 us, min = 20.320 us, total = 205.122 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 3.715 us, total = 7.430 us, Queueing time: mean = 28.725 us, max = 57.450 us, min = 57.450 us, total = 57.450 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:32:16,202 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:32:16,268 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [190000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 0 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, node:192.168.0.2: 10000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, memory: 869061529600000, accelerator_type:A40: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:__internal_head__: 10000, CPU: 190000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, memory: 869061529600000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 1 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] - (language=PYTHON actor_or_task=process_csv_file pid=16856 worker_id=bdb2b9463e9e49360a1f1d5bfaa73ef46a03b5efa5248a9360109312): {CPU: 10000} [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] - {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=__main__, class_name=, function_name=process_csv_file, function_hash=8d91a724246e46c4bc28c071a498df7c} scheduling_strategy=default_scheduling_strategy { [state-dump] } [state-dump] resource_set={CPU : 1, }}: 1/20 [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 19 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 105057 total (35 active) [state-dump] Queueing time: mean = 28.298 ms, max = 418.747 s, min = -0.000 s, total = 2972.915 s [state-dump] Execution time: mean = 11.575 ms, total = 1216.034 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 25179 total (0 active), Execution time: mean = 484.138 us, total = 12.190 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 25179 total (0 active), Execution time: mean = 34.254 us, total = 862.483 ms, Queueing time: mean = 99.836 us, max = 23.460 ms, min = 3.300 us, total = 2.514 s [state-dump] RaySyncer.OnDemandBroadcasting - 11990 total (1 active), Execution time: mean = 10.458 us, total = 125.388 ms, Queueing time: mean = 79.977 us, max = 3.167 ms, min = -0.000 s, total = 958.921 ms [state-dump] NodeManager.CheckGC - 11990 total (1 active), Execution time: mean = 3.167 us, total = 37.974 ms, Queueing time: mean = 86.292 us, max = 3.183 ms, min = 6.205 us, total = 1.035 s [state-dump] ObjectManager.UpdateAvailableMemory - 11989 total (0 active), Execution time: mean = 5.368 us, total = 64.353 ms, Queueing time: mean = 94.642 us, max = 656.726 us, min = 3.103 us, total = 1.135 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 5998 total (1 active), Execution time: mean = 17.157 us, total = 102.909 ms, Queueing time: mean = 67.454 us, max = 1.134 ms, min = 5.101 us, total = 404.588 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 4791 total (1 active), Execution time: mean = 440.999 us, total = 2.113 s, Queueing time: mean = 67.460 us, max = 1.438 ms, min = 6.986 us, total = 323.203 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1200 total (1 active), Execution time: mean = 3.261 us, total = 3.913 ms, Queueing time: mean = 173.996 us, max = 3.551 ms, min = 4.207 us, total = 208.795 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1200 total (1 active), Execution time: mean = 14.588 us, total = 17.505 ms, Queueing time: mean = 66.992 us, max = 2.658 ms, min = 8.343 us, total = 80.391 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1200 total (1 active), Execution time: mean = 8.119 us, total = 9.743 ms, Queueing time: mean = 170.882 us, max = 3.537 ms, min = 6.711 us, total = 205.059 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1199 total (0 active), Execution time: mean = 586.624 us, total = 703.363 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1199 total (0 active), Execution time: mean = 102.348 us, total = 122.715 ms, Queueing time: mean = 101.961 us, max = 901.407 us, min = 14.850 us, total = 122.252 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 401 total (1 active), Execution time: mean = 8.130 us, total = 3.260 ms, Queueing time: mean = 67.725 us, max = 496.804 us, min = 17.556 us, total = 27.158 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 240 total (0 active), Execution time: mean = 48.083 us, total = 11.540 ms, Queueing time: mean = 96.438 us, max = 202.139 us, min = 12.562 us, total = 23.145 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 240 total (0 active), Execution time: mean = 1.327 ms, total = 318.575 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 240 total (1 active), Execution time: mean = 268.233 us, total = 64.376 ms, Queueing time: mean = 613.064 us, max = 2.157 ms, min = 5.323 us, total = 147.135 ms [state-dump] NodeManager.deadline_timer.record_metrics - 240 total (1 active), Execution time: mean = 522.346 us, total = 125.363 ms, Queueing time: mean = 360.185 us, max = 1.805 ms, min = 8.783 us, total = 86.444 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 120 total (1 active), Execution time: mean = 1.697 ms, total = 203.616 ms, Queueing time: mean = 64.142 us, max = 174.696 us, min = 11.175 us, total = 7.697 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 104 total (21 active), Execution time: mean = 7.146 us, total = 743.225 us, Queueing time: mean = 28.515 s, max = 418.747 s, min = 21.510 us, total = 2965.564 s [state-dump] ClientConnection.async_read.ProcessMessage - 83 total (0 active), Execution time: mean = 883.380 us, total = 73.321 ms, Queueing time: mean = 42.072 us, max = 626.320 us, min = 4.173 us, total = 3.492 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 21 total (0 active), Execution time: mean = 1.023 us, total = 21.481 us, Queueing time: mean = 109.155 us, max = 324.279 us, min = 19.153 us, total = 2.292 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] RaySyncer.BroadcastMessage - 21 total (0 active), Execution time: mean = 230.971 us, total = 4.850 ms, Queueing time: mean = 710.190 ns, max = 1.120 us, min = 208.000 ns, total = 14.914 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 20 total (1 active, 1 running), Execution time: mean = 2.678 ms, total = 53.552 ms, Queueing time: mean = 53.263 us, max = 81.847 us, min = 17.325 us, total = 1.065 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 13 total (0 active), Execution time: mean = 587.514 us, total = 7.638 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 13 total (0 active), Execution time: mean = 123.777 us, total = 1.609 ms, Queueing time: mean = 73.345 us, max = 135.953 us, min = 20.450 us, total = 953.485 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.688 s, total = 1198.753 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 333.624 us, total = 1.001 ms, Queueing time: mean = 88.799 us, max = 184.802 us, min = 20.320 us, total = 266.398 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 3.715 us, total = 7.430 us, Queueing time: mean = 28.725 us, max = 57.450 us, min = 57.450 us, total = 57.450 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:33:16,202 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:33:16,270 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [190000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 0 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, node:192.168.0.2: 10000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, memory: 869061529600000, accelerator_type:A40: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:__internal_head__: 10000, CPU: 190000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, memory: 869061529600000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 1 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] - (language=PYTHON actor_or_task=process_csv_file pid=16856 worker_id=bdb2b9463e9e49360a1f1d5bfaa73ef46a03b5efa5248a9360109312): {CPU: 10000} [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] - {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=__main__, class_name=, function_name=process_csv_file, function_hash=8d91a724246e46c4bc28c071a498df7c} scheduling_strategy=default_scheduling_strategy { [state-dump] } [state-dump] resource_set={CPU : 1, }}: 1/20 [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 19 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 110292 total (35 active) [state-dump] Queueing time: mean = 26.959 ms, max = 418.747 s, min = -0.000 s, total = 2973.317 s [state-dump] Execution time: mean = 11.034 ms, total = 1216.964 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 26439 total (0 active), Execution time: mean = 486.707 us, total = 12.868 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 26439 total (0 active), Execution time: mean = 34.449 us, total = 910.791 ms, Queueing time: mean = 100.850 us, max = 23.460 ms, min = 3.300 us, total = 2.666 s [state-dump] RaySyncer.OnDemandBroadcasting - 12590 total (1 active), Execution time: mean = 10.432 us, total = 131.341 ms, Queueing time: mean = 80.148 us, max = 3.167 ms, min = -0.000 s, total = 1.009 s [state-dump] NodeManager.CheckGC - 12590 total (1 active), Execution time: mean = 3.165 us, total = 39.846 ms, Queueing time: mean = 86.445 us, max = 3.183 ms, min = 6.205 us, total = 1.088 s [state-dump] ObjectManager.UpdateAvailableMemory - 12589 total (0 active), Execution time: mean = 5.386 us, total = 67.800 ms, Queueing time: mean = 95.201 us, max = 656.726 us, min = 3.103 us, total = 1.198 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 6298 total (1 active), Execution time: mean = 17.149 us, total = 108.003 ms, Queueing time: mean = 67.425 us, max = 1.134 ms, min = 5.101 us, total = 424.640 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 5031 total (1 active), Execution time: mean = 440.149 us, total = 2.214 s, Queueing time: mean = 67.527 us, max = 1.438 ms, min = 6.986 us, total = 339.729 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1260 total (1 active), Execution time: mean = 3.256 us, total = 4.102 ms, Queueing time: mean = 173.791 us, max = 3.551 ms, min = 4.207 us, total = 218.977 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1260 total (1 active), Execution time: mean = 14.640 us, total = 18.447 ms, Queueing time: mean = 66.904 us, max = 2.658 ms, min = 8.343 us, total = 84.299 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1260 total (1 active), Execution time: mean = 8.114 us, total = 10.224 ms, Queueing time: mean = 170.678 us, max = 3.537 ms, min = 6.711 us, total = 215.054 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1259 total (0 active), Execution time: mean = 588.784 us, total = 741.279 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1259 total (0 active), Execution time: mean = 102.936 us, total = 129.597 ms, Queueing time: mean = 102.312 us, max = 901.407 us, min = 14.850 us, total = 128.810 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 421 total (1 active), Execution time: mean = 8.171 us, total = 3.440 ms, Queueing time: mean = 68.098 us, max = 496.804 us, min = 17.556 us, total = 28.669 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 252 total (0 active), Execution time: mean = 48.031 us, total = 12.104 ms, Queueing time: mean = 96.631 us, max = 202.139 us, min = 6.906 us, total = 24.351 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 252 total (0 active), Execution time: mean = 1.335 ms, total = 336.404 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 252 total (1 active), Execution time: mean = 268.273 us, total = 67.605 ms, Queueing time: mean = 612.457 us, max = 2.157 ms, min = 5.323 us, total = 154.339 ms [state-dump] NodeManager.deadline_timer.record_metrics - 252 total (1 active), Execution time: mean = 522.408 us, total = 131.647 ms, Queueing time: mean = 359.649 us, max = 1.805 ms, min = 8.783 us, total = 90.632 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 126 total (1 active), Execution time: mean = 1.697 ms, total = 213.816 ms, Queueing time: mean = 63.455 us, max = 174.696 us, min = 11.175 us, total = 7.995 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 104 total (21 active), Execution time: mean = 7.146 us, total = 743.225 us, Queueing time: mean = 28.515 s, max = 418.747 s, min = 21.510 us, total = 2965.564 s [state-dump] ClientConnection.async_read.ProcessMessage - 83 total (0 active), Execution time: mean = 883.380 us, total = 73.321 ms, Queueing time: mean = 42.072 us, max = 626.320 us, min = 4.173 us, total = 3.492 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 21 total (0 active), Execution time: mean = 1.023 us, total = 21.481 us, Queueing time: mean = 109.155 us, max = 324.279 us, min = 19.153 us, total = 2.292 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 21 total (1 active, 1 running), Execution time: mean = 2.625 ms, total = 55.115 ms, Queueing time: mean = 53.047 us, max = 81.847 us, min = 17.325 us, total = 1.114 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] RaySyncer.BroadcastMessage - 21 total (0 active), Execution time: mean = 230.971 us, total = 4.850 ms, Queueing time: mean = 710.190 ns, max = 1.120 us, min = 208.000 ns, total = 14.914 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 13 total (0 active), Execution time: mean = 587.514 us, total = 7.638 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 13 total (0 active), Execution time: mean = 123.777 us, total = 1.609 ms, Queueing time: mean = 73.345 us, max = 135.953 us, min = 20.450 us, total = 953.485 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.688 s, total = 1198.753 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 333.624 us, total = 1.001 ms, Queueing time: mean = 88.799 us, max = 184.802 us, min = 20.320 us, total = 266.398 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 3.715 us, total = 7.430 us, Queueing time: mean = 28.725 us, max = 57.450 us, min = 57.450 us, total = 57.450 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:34:16,202 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:34:16,273 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [190000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 0 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, node:192.168.0.2: 10000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, memory: 869061529600000, accelerator_type:A40: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:__internal_head__: 10000, CPU: 190000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, memory: 869061529600000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 1 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] - (language=PYTHON actor_or_task=process_csv_file pid=16856 worker_id=bdb2b9463e9e49360a1f1d5bfaa73ef46a03b5efa5248a9360109312): {CPU: 10000} [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] - {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=__main__, class_name=, function_name=process_csv_file, function_hash=8d91a724246e46c4bc28c071a498df7c} scheduling_strategy=default_scheduling_strategy { [state-dump] } [state-dump] resource_set={CPU : 1, }}: 1/20 [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 19 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 115523 total (35 active) [state-dump] Queueing time: mean = 25.742 ms, max = 418.747 s, min = -0.000 s, total = 2973.737 s [state-dump] Execution time: mean = 10.542 ms, total = 1217.891 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 27699 total (0 active), Execution time: mean = 488.757 us, total = 13.538 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 27699 total (0 active), Execution time: mean = 34.656 us, total = 959.947 ms, Queueing time: mean = 101.673 us, max = 23.460 ms, min = 3.300 us, total = 2.816 s [state-dump] RaySyncer.OnDemandBroadcasting - 13189 total (1 active), Execution time: mean = 10.412 us, total = 137.322 ms, Queueing time: mean = 80.686 us, max = 3.167 ms, min = -0.000 s, total = 1.064 s [state-dump] NodeManager.CheckGC - 13189 total (1 active), Execution time: mean = 3.163 us, total = 41.722 ms, Queueing time: mean = 86.969 us, max = 3.183 ms, min = 6.205 us, total = 1.147 s [state-dump] ObjectManager.UpdateAvailableMemory - 13188 total (0 active), Execution time: mean = 5.412 us, total = 71.377 ms, Queueing time: mean = 96.095 us, max = 802.131 us, min = 3.103 us, total = 1.267 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 6598 total (1 active), Execution time: mean = 17.195 us, total = 113.452 ms, Queueing time: mean = 67.714 us, max = 1.134 ms, min = 5.101 us, total = 446.780 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 5270 total (1 active), Execution time: mean = 440.098 us, total = 2.319 s, Queueing time: mean = 67.765 us, max = 1.438 ms, min = 6.986 us, total = 357.122 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1320 total (1 active), Execution time: mean = 3.251 us, total = 4.291 ms, Queueing time: mean = 173.984 us, max = 3.551 ms, min = 4.207 us, total = 229.659 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1320 total (1 active), Execution time: mean = 14.711 us, total = 19.419 ms, Queueing time: mean = 66.919 us, max = 2.658 ms, min = 8.343 us, total = 88.333 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1320 total (1 active), Execution time: mean = 8.153 us, total = 10.761 ms, Queueing time: mean = 170.846 us, max = 3.537 ms, min = 6.711 us, total = 225.517 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1319 total (0 active), Execution time: mean = 590.628 us, total = 779.038 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1319 total (0 active), Execution time: mean = 103.404 us, total = 136.390 ms, Queueing time: mean = 103.268 us, max = 901.407 us, min = 14.850 us, total = 136.211 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 441 total (1 active), Execution time: mean = 8.240 us, total = 3.634 ms, Queueing time: mean = 68.054 us, max = 496.804 us, min = 17.556 us, total = 30.012 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 264 total (0 active), Execution time: mean = 48.106 us, total = 12.700 ms, Queueing time: mean = 96.960 us, max = 202.139 us, min = 6.906 us, total = 25.597 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 264 total (0 active), Execution time: mean = 1.338 ms, total = 353.160 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 264 total (1 active), Execution time: mean = 268.034 us, total = 70.761 ms, Queueing time: mean = 613.235 us, max = 2.157 ms, min = 5.323 us, total = 161.894 ms [state-dump] NodeManager.deadline_timer.record_metrics - 264 total (1 active), Execution time: mean = 520.899 us, total = 137.517 ms, Queueing time: mean = 361.565 us, max = 1.805 ms, min = 8.783 us, total = 95.453 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 132 total (1 active), Execution time: mean = 1.697 ms, total = 223.998 ms, Queueing time: mean = 64.775 us, max = 174.696 us, min = 11.175 us, total = 8.550 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 104 total (21 active), Execution time: mean = 7.146 us, total = 743.225 us, Queueing time: mean = 28.515 s, max = 418.747 s, min = 21.510 us, total = 2965.564 s [state-dump] ClientConnection.async_read.ProcessMessage - 83 total (0 active), Execution time: mean = 883.380 us, total = 73.321 ms, Queueing time: mean = 42.072 us, max = 626.320 us, min = 4.173 us, total = 3.492 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 22 total (1 active, 1 running), Execution time: mean = 2.632 ms, total = 57.913 ms, Queueing time: mean = 53.021 us, max = 81.847 us, min = 17.325 us, total = 1.166 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 21 total (0 active), Execution time: mean = 1.023 us, total = 21.481 us, Queueing time: mean = 109.155 us, max = 324.279 us, min = 19.153 us, total = 2.292 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] RaySyncer.BroadcastMessage - 21 total (0 active), Execution time: mean = 230.971 us, total = 4.850 ms, Queueing time: mean = 710.190 ns, max = 1.120 us, min = 208.000 ns, total = 14.914 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 13 total (0 active), Execution time: mean = 587.514 us, total = 7.638 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 13 total (0 active), Execution time: mean = 123.777 us, total = 1.609 ms, Queueing time: mean = 73.345 us, max = 135.953 us, min = 20.450 us, total = 953.485 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.688 s, total = 1198.753 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 333.624 us, total = 1.001 ms, Queueing time: mean = 88.799 us, max = 184.802 us, min = 20.320 us, total = 266.398 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 3.715 us, total = 7.430 us, Queueing time: mean = 28.725 us, max = 57.450 us, min = 57.450 us, total = 57.450 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 05:35:16,202 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:35:16,276 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [190000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 0 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, node:192.168.0.2: 10000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, memory: 869061529600000, accelerator_type:A40: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:__internal_head__: 10000, CPU: 190000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, memory: 869061529600000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 1 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] - (language=PYTHON actor_or_task=process_csv_file pid=16856 worker_id=bdb2b9463e9e49360a1f1d5bfaa73ef46a03b5efa5248a9360109312): {CPU: 10000} [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] - {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=__main__, class_name=, function_name=process_csv_file, function_hash=8d91a724246e46c4bc28c071a498df7c} scheduling_strategy=default_scheduling_strategy { [state-dump] } [state-dump] resource_set={CPU : 1, }}: 1/20 [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 19 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 120758 total (35 active) [state-dump] Queueing time: mean = 24.629 ms, max = 418.747 s, min = -0.000 s, total = 2974.154 s [state-dump] Execution time: mean = 10.093 ms, total = 1218.786 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 28959 total (0 active), Execution time: mean = 489.623 us, total = 14.179 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 28959 total (0 active), Execution time: mean = 34.812 us, total = 1.008 s, Queueing time: mean = 102.207 us, max = 23.460 ms, min = 2.263 us, total = 2.960 s [state-dump] RaySyncer.OnDemandBroadcasting - 13789 total (1 active), Execution time: mean = 10.412 us, total = 143.573 ms, Queueing time: mean = 81.608 us, max = 4.503 ms, min = -0.000 s, total = 1.125 s [state-dump] NodeManager.CheckGC - 13789 total (1 active), Execution time: mean = 3.167 us, total = 43.668 ms, Queueing time: mean = 87.887 us, max = 4.512 ms, min = 6.205 us, total = 1.212 s [state-dump] ObjectManager.UpdateAvailableMemory - 13788 total (0 active), Execution time: mean = 5.422 us, total = 74.756 ms, Queueing time: mean = 96.384 us, max = 802.131 us, min = 3.103 us, total = 1.329 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 6898 total (1 active), Execution time: mean = 17.183 us, total = 118.526 ms, Queueing time: mean = 67.842 us, max = 1.134 ms, min = 5.101 us, total = 467.971 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 5510 total (1 active), Execution time: mean = 440.201 us, total = 2.426 s, Queueing time: mean = 67.796 us, max = 1.438 ms, min = 6.986 us, total = 373.558 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1380 total (1 active), Execution time: mean = 3.243 us, total = 4.476 ms, Queueing time: mean = 174.325 us, max = 3.551 ms, min = 4.207 us, total = 240.569 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1380 total (1 active), Execution time: mean = 14.796 us, total = 20.418 ms, Queueing time: mean = 66.829 us, max = 2.658 ms, min = 8.343 us, total = 92.224 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1380 total (1 active), Execution time: mean = 8.154 us, total = 11.253 ms, Queueing time: mean = 171.191 us, max = 3.537 ms, min = 6.711 us, total = 236.244 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1379 total (0 active), Execution time: mean = 590.418 us, total = 814.186 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1379 total (0 active), Execution time: mean = 103.777 us, total = 143.108 ms, Queueing time: mean = 102.958 us, max = 901.407 us, min = 14.850 us, total = 141.979 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 461 total (1 active), Execution time: mean = 8.269 us, total = 3.812 ms, Queueing time: mean = 68.096 us, max = 496.804 us, min = 17.556 us, total = 31.392 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 276 total (0 active), Execution time: mean = 48.153 us, total = 13.290 ms, Queueing time: mean = 97.751 us, max = 202.139 us, min = 6.906 us, total = 26.979 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 276 total (0 active), Execution time: mean = 1.338 ms, total = 369.324 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 276 total (1 active), Execution time: mean = 268.311 us, total = 74.054 ms, Queueing time: mean = 615.121 us, max = 2.157 ms, min = 5.323 us, total = 169.773 ms [state-dump] NodeManager.deadline_timer.record_metrics - 276 total (1 active), Execution time: mean = 520.390 us, total = 143.628 ms, Queueing time: mean = 364.318 us, max = 1.805 ms, min = 8.783 us, total = 100.552 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 138 total (1 active), Execution time: mean = 1.701 ms, total = 234.677 ms, Queueing time: mean = 65.328 us, max = 174.696 us, min = 11.175 us, total = 9.015 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 104 total (21 active), Execution time: mean = 7.146 us, total = 743.225 us, Queueing time: mean = 28.515 s, max = 418.747 s, min = 21.510 us, total = 2965.564 s [state-dump] ClientConnection.async_read.ProcessMessage - 83 total (0 active), Execution time: mean = 883.380 us, total = 73.321 ms, Queueing time: mean = 42.072 us, max = 626.320 us, min = 4.173 us, total = 3.492 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 23 total (1 active, 1 running), Execution time: mean = 2.622 ms, total = 60.301 ms, Queueing time: mean = 58.203 us, max = 172.215 us, min = 17.325 us, total = 1.339 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 21 total (0 active), Execution time: mean = 1.023 us, total = 21.481 us, Queueing time: mean = 109.155 us, max = 324.279 us, min = 19.153 us, total = 2.292 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] RaySyncer.BroadcastMessage - 21 total (0 active), Execution time: mean = 230.971 us, total = 4.850 ms, Queueing time: mean = 710.190 ns, max = 1.120 us, min = 208.000 ns, total = 14.914 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 13 total (0 active), Execution time: mean = 587.514 us, total = 7.638 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 13 total (0 active), Execution time: mean = 123.777 us, total = 1.609 ms, Queueing time: mean = 73.345 us, max = 135.953 us, min = 20.450 us, total = 953.485 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.688 s, total = 1198.753 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 333.624 us, total = 1.001 ms, Queueing time: mean = 88.799 us, max = 184.802 us, min = 20.320 us, total = 266.398 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 3.715 us, total = 7.430 us, Queueing time: mean = 28.725 us, max = 57.450 us, min = 57.450 us, total = 57.450 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:36:16,203 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:36:16,277 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [190000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 0 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, node:192.168.0.2: 10000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, memory: 869061529600000, accelerator_type:A40: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:__internal_head__: 10000, CPU: 190000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, memory: 869061529600000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 1 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] - (language=PYTHON actor_or_task=process_csv_file pid=16856 worker_id=bdb2b9463e9e49360a1f1d5bfaa73ef46a03b5efa5248a9360109312): {CPU: 10000} [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] - {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=__main__, class_name=, function_name=process_csv_file, function_hash=8d91a724246e46c4bc28c071a498df7c} scheduling_strategy=default_scheduling_strategy { [state-dump] } [state-dump] resource_set={CPU : 1, }}: 1/20 [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 19 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 125989 total (35 active) [state-dump] Queueing time: mean = 23.610 ms, max = 418.747 s, min = -0.000 s, total = 2974.561 s [state-dump] Execution time: mean = 9.681 ms, total = 1219.728 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 30219 total (0 active), Execution time: mean = 491.853 us, total = 14.863 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 30219 total (0 active), Execution time: mean = 34.986 us, total = 1.057 s, Queueing time: mean = 102.852 us, max = 23.460 ms, min = 2.263 us, total = 3.108 s [state-dump] RaySyncer.OnDemandBroadcasting - 14388 total (1 active), Execution time: mean = 10.393 us, total = 149.531 ms, Queueing time: mean = 81.761 us, max = 4.503 ms, min = -0.000 s, total = 1.176 s [state-dump] NodeManager.CheckGC - 14388 total (1 active), Execution time: mean = 3.171 us, total = 45.621 ms, Queueing time: mean = 88.020 us, max = 4.512 ms, min = 3.126 us, total = 1.266 s [state-dump] ObjectManager.UpdateAvailableMemory - 14387 total (0 active), Execution time: mean = 5.433 us, total = 78.161 ms, Queueing time: mean = 96.979 us, max = 802.131 us, min = 3.103 us, total = 1.395 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 7198 total (1 active), Execution time: mean = 17.178 us, total = 123.645 ms, Queueing time: mean = 68.009 us, max = 1.134 ms, min = 5.101 us, total = 489.526 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 5749 total (1 active), Execution time: mean = 440.519 us, total = 2.533 s, Queueing time: mean = 68.023 us, max = 1.438 ms, min = 6.986 us, total = 391.067 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1440 total (1 active), Execution time: mean = 3.231 us, total = 4.653 ms, Queueing time: mean = 174.312 us, max = 3.551 ms, min = 4.207 us, total = 251.009 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1440 total (1 active), Execution time: mean = 14.841 us, total = 21.372 ms, Queueing time: mean = 66.966 us, max = 2.658 ms, min = 8.343 us, total = 96.430 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1440 total (1 active), Execution time: mean = 8.156 us, total = 11.745 ms, Queueing time: mean = 171.173 us, max = 3.537 ms, min = 6.711 us, total = 246.489 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1439 total (0 active), Execution time: mean = 593.248 us, total = 853.684 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1439 total (0 active), Execution time: mean = 104.306 us, total = 150.096 ms, Queueing time: mean = 103.654 us, max = 901.407 us, min = 13.883 us, total = 149.158 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 481 total (1 active), Execution time: mean = 8.306 us, total = 3.995 ms, Queueing time: mean = 68.334 us, max = 496.804 us, min = 17.556 us, total = 32.869 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 288 total (0 active), Execution time: mean = 48.086 us, total = 13.849 ms, Queueing time: mean = 98.949 us, max = 202.139 us, min = 6.906 us, total = 28.497 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 288 total (0 active), Execution time: mean = 1.337 ms, total = 384.926 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 288 total (1 active), Execution time: mean = 267.505 us, total = 77.041 ms, Queueing time: mean = 615.367 us, max = 2.157 ms, min = 5.323 us, total = 177.226 ms [state-dump] NodeManager.deadline_timer.record_metrics - 288 total (1 active), Execution time: mean = 517.348 us, total = 148.996 ms, Queueing time: mean = 366.606 us, max = 1.805 ms, min = 8.783 us, total = 105.582 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 144 total (1 active), Execution time: mean = 1.697 ms, total = 244.369 ms, Queueing time: mean = 67.003 us, max = 174.696 us, min = 11.175 us, total = 9.648 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 104 total (21 active), Execution time: mean = 7.146 us, total = 743.225 us, Queueing time: mean = 28.515 s, max = 418.747 s, min = 21.510 us, total = 2965.564 s [state-dump] ClientConnection.async_read.ProcessMessage - 83 total (0 active), Execution time: mean = 883.380 us, total = 73.321 ms, Queueing time: mean = 42.072 us, max = 626.320 us, min = 4.173 us, total = 3.492 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 24 total (1 active, 1 running), Execution time: mean = 2.635 ms, total = 63.233 ms, Queueing time: mean = 57.782 us, max = 172.215 us, min = 17.325 us, total = 1.387 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 21 total (0 active), Execution time: mean = 1.023 us, total = 21.481 us, Queueing time: mean = 109.155 us, max = 324.279 us, min = 19.153 us, total = 2.292 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] RaySyncer.BroadcastMessage - 21 total (0 active), Execution time: mean = 230.971 us, total = 4.850 ms, Queueing time: mean = 710.190 ns, max = 1.120 us, min = 208.000 ns, total = 14.914 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 13 total (0 active), Execution time: mean = 587.514 us, total = 7.638 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 13 total (0 active), Execution time: mean = 123.777 us, total = 1.609 ms, Queueing time: mean = 73.345 us, max = 135.953 us, min = 20.450 us, total = 953.485 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.688 s, total = 1198.753 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 333.624 us, total = 1.001 ms, Queueing time: mean = 88.799 us, max = 184.802 us, min = 20.320 us, total = 266.398 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 3.715 us, total = 7.430 us, Queueing time: mean = 28.725 us, max = 57.450 us, min = 57.450 us, total = 57.450 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:37:16,203 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:37:16,280 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [190000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 0 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, node:192.168.0.2: 10000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, memory: 869061529600000, accelerator_type:A40: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:__internal_head__: 10000, CPU: 190000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, memory: 869061529600000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 1 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] - (language=PYTHON actor_or_task=process_csv_file pid=16856 worker_id=bdb2b9463e9e49360a1f1d5bfaa73ef46a03b5efa5248a9360109312): {CPU: 10000} [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] - {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=__main__, class_name=, function_name=process_csv_file, function_hash=8d91a724246e46c4bc28c071a498df7c} scheduling_strategy=default_scheduling_strategy { [state-dump] } [state-dump] resource_set={CPU : 1, }}: 1/20 [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 19 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 131224 total (35 active) [state-dump] Queueing time: mean = 22.671 ms, max = 418.747 s, min = -0.000 s, total = 2974.966 s [state-dump] Execution time: mean = 9.302 ms, total = 1220.629 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 31479 total (0 active), Execution time: mean = 492.570 us, total = 15.506 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 31479 total (0 active), Execution time: mean = 35.113 us, total = 1.105 s, Queueing time: mean = 103.204 us, max = 23.460 ms, min = 2.263 us, total = 3.249 s [state-dump] RaySyncer.OnDemandBroadcasting - 14988 total (1 active), Execution time: mean = 10.395 us, total = 155.798 ms, Queueing time: mean = 82.142 us, max = 4.503 ms, min = -0.000 s, total = 1.231 s [state-dump] NodeManager.CheckGC - 14988 total (1 active), Execution time: mean = 3.175 us, total = 47.592 ms, Queueing time: mean = 88.399 us, max = 4.512 ms, min = 3.126 us, total = 1.325 s [state-dump] ObjectManager.UpdateAvailableMemory - 14987 total (0 active), Execution time: mean = 5.451 us, total = 81.698 ms, Queueing time: mean = 97.344 us, max = 802.131 us, min = 3.103 us, total = 1.459 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 7498 total (1 active), Execution time: mean = 17.244 us, total = 129.295 ms, Queueing time: mean = 68.178 us, max = 1.134 ms, min = 5.101 us, total = 511.201 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 5989 total (1 active), Execution time: mean = 440.971 us, total = 2.641 s, Queueing time: mean = 68.128 us, max = 1.438 ms, min = 6.986 us, total = 408.022 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1500 total (1 active), Execution time: mean = 3.227 us, total = 4.840 ms, Queueing time: mean = 174.654 us, max = 3.551 ms, min = 4.207 us, total = 261.980 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1500 total (1 active), Execution time: mean = 14.964 us, total = 22.446 ms, Queueing time: mean = 67.037 us, max = 2.658 ms, min = 8.343 us, total = 100.555 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1500 total (1 active), Execution time: mean = 8.168 us, total = 12.252 ms, Queueing time: mean = 171.503 us, max = 3.537 ms, min = 6.711 us, total = 257.255 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1499 total (0 active), Execution time: mean = 594.538 us, total = 891.213 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1499 total (0 active), Execution time: mean = 105.113 us, total = 157.565 ms, Queueing time: mean = 103.888 us, max = 901.407 us, min = 13.883 us, total = 155.728 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 501 total (1 active), Execution time: mean = 8.308 us, total = 4.163 ms, Queueing time: mean = 68.106 us, max = 496.804 us, min = 17.556 us, total = 34.121 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 300 total (0 active), Execution time: mean = 48.005 us, total = 14.401 ms, Queueing time: mean = 99.507 us, max = 210.294 us, min = 6.906 us, total = 29.852 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 300 total (0 active), Execution time: mean = 1.338 ms, total = 401.352 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 300 total (1 active), Execution time: mean = 266.925 us, total = 80.077 ms, Queueing time: mean = 617.209 us, max = 2.199 ms, min = 5.323 us, total = 185.163 ms [state-dump] NodeManager.deadline_timer.record_metrics - 300 total (1 active), Execution time: mean = 515.352 us, total = 154.606 ms, Queueing time: mean = 369.971 us, max = 1.908 ms, min = 8.783 us, total = 110.991 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 150 total (1 active), Execution time: mean = 1.700 ms, total = 255.036 ms, Queueing time: mean = 67.064 us, max = 174.696 us, min = 11.175 us, total = 10.060 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 104 total (21 active), Execution time: mean = 7.146 us, total = 743.225 us, Queueing time: mean = 28.515 s, max = 418.747 s, min = 21.510 us, total = 2965.564 s [state-dump] ClientConnection.async_read.ProcessMessage - 83 total (0 active), Execution time: mean = 883.380 us, total = 73.321 ms, Queueing time: mean = 42.072 us, max = 626.320 us, min = 4.173 us, total = 3.492 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 25 total (1 active, 1 running), Execution time: mean = 2.581 ms, total = 64.532 ms, Queueing time: mean = 57.091 us, max = 172.215 us, min = 17.325 us, total = 1.427 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 21 total (0 active), Execution time: mean = 1.023 us, total = 21.481 us, Queueing time: mean = 109.155 us, max = 324.279 us, min = 19.153 us, total = 2.292 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] RaySyncer.BroadcastMessage - 21 total (0 active), Execution time: mean = 230.971 us, total = 4.850 ms, Queueing time: mean = 710.190 ns, max = 1.120 us, min = 208.000 ns, total = 14.914 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 13 total (0 active), Execution time: mean = 587.514 us, total = 7.638 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 13 total (0 active), Execution time: mean = 123.777 us, total = 1.609 ms, Queueing time: mean = 73.345 us, max = 135.953 us, min = 20.450 us, total = 953.485 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.688 s, total = 1198.753 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 333.624 us, total = 1.001 ms, Queueing time: mean = 88.799 us, max = 184.802 us, min = 20.320 us, total = 266.398 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 3.715 us, total = 7.430 us, Queueing time: mean = 28.725 us, max = 57.450 us, min = 57.450 us, total = 57.450 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:38:16,204 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:38:16,283 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 136461 total (35 active) [state-dump] Queueing time: mean = 26.129 ms, max = 590.169 s, min = -0.000 s, total = 3565.545 s [state-dump] Execution time: mean = 8.952 ms, total = 1221.544 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 32739 total (0 active), Execution time: mean = 35.144 us, total = 1.151 s, Queueing time: mean = 103.600 us, max = 23.460 ms, min = 2.263 us, total = 3.392 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 32739 total (0 active), Execution time: mean = 493.306 us, total = 16.150 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 15587 total (1 active), Execution time: mean = 10.466 us, total = 163.138 ms, Queueing time: mean = 82.508 us, max = 4.503 ms, min = -0.000 s, total = 1.286 s [state-dump] NodeManager.CheckGC - 15587 total (1 active), Execution time: mean = 3.188 us, total = 49.691 ms, Queueing time: mean = 88.824 us, max = 4.512 ms, min = 3.126 us, total = 1.384 s [state-dump] ObjectManager.UpdateAvailableMemory - 15586 total (0 active), Execution time: mean = 5.482 us, total = 85.444 ms, Queueing time: mean = 97.638 us, max = 802.131 us, min = 3.103 us, total = 1.522 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 7798 total (1 active), Execution time: mean = 17.344 us, total = 135.248 ms, Queueing time: mean = 68.416 us, max = 1.134 ms, min = 5.101 us, total = 533.506 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 6228 total (1 active), Execution time: mean = 442.047 us, total = 2.753 s, Queueing time: mean = 68.399 us, max = 1.438 ms, min = 6.986 us, total = 425.990 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1560 total (1 active), Execution time: mean = 8.247 us, total = 12.865 ms, Queueing time: mean = 172.076 us, max = 3.537 ms, min = 6.711 us, total = 268.439 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1560 total (1 active), Execution time: mean = 15.064 us, total = 23.499 ms, Queueing time: mean = 66.950 us, max = 2.658 ms, min = 8.343 us, total = 104.442 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1560 total (1 active), Execution time: mean = 3.237 us, total = 5.049 ms, Queueing time: mean = 175.267 us, max = 3.551 ms, min = 4.207 us, total = 273.417 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1559 total (0 active), Execution time: mean = 596.243 us, total = 929.543 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1559 total (0 active), Execution time: mean = 105.434 us, total = 164.372 ms, Queueing time: mean = 104.197 us, max = 901.407 us, min = 13.883 us, total = 162.443 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 521 total (1 active), Execution time: mean = 8.330 us, total = 4.340 ms, Queueing time: mean = 68.090 us, max = 496.804 us, min = 17.556 us, total = 35.475 ms [state-dump] NodeManager.deadline_timer.record_metrics - 312 total (1 active), Execution time: mean = 515.891 us, total = 160.958 ms, Queueing time: mean = 372.198 us, max = 1.908 ms, min = 8.783 us, total = 116.126 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 312 total (0 active), Execution time: mean = 1.351 ms, total = 421.548 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 312 total (1 active), Execution time: mean = 270.025 us, total = 84.248 ms, Queueing time: mean = 618.310 us, max = 2.199 ms, min = 5.323 us, total = 192.913 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 312 total (0 active), Execution time: mean = 48.213 us, total = 15.043 ms, Queueing time: mean = 98.798 us, max = 210.294 us, min = 6.906 us, total = 30.825 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 156 total (1 active), Execution time: mean = 1.704 ms, total = 265.875 ms, Queueing time: mean = 67.342 us, max = 174.696 us, min = 11.175 us, total = 10.505 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 26 total (1 active, 1 running), Execution time: mean = 2.587 ms, total = 67.250 ms, Queueing time: mean = 57.044 us, max = 172.215 us, min = 17.325 us, total = 1.483 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.688 s, total = 1198.753 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 333.624 us, total = 1.001 ms, Queueing time: mean = 88.799 us, max = 184.802 us, min = 20.320 us, total = 266.398 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 3.715 us, total = 7.430 us, Queueing time: mean = 28.725 us, max = 57.450 us, min = 57.450 us, total = 57.450 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:39:16,204 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:39:16,286 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 141693 total (35 active) [state-dump] Queueing time: mean = 25.167 ms, max = 590.169 s, min = -0.000 s, total = 3565.935 s [state-dump] Execution time: mean = 8.627 ms, total = 1222.440 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 33999 total (0 active), Execution time: mean = 35.132 us, total = 1.194 s, Queueing time: mean = 103.676 us, max = 23.460 ms, min = 2.263 us, total = 3.525 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 33999 total (0 active), Execution time: mean = 493.673 us, total = 16.784 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 16186 total (1 active), Execution time: mean = 10.484 us, total = 169.689 ms, Queueing time: mean = 82.704 us, max = 4.503 ms, min = -0.000 s, total = 1.339 s [state-dump] NodeManager.CheckGC - 16186 total (1 active), Execution time: mean = 3.199 us, total = 51.774 ms, Queueing time: mean = 89.030 us, max = 4.512 ms, min = 3.126 us, total = 1.441 s [state-dump] ObjectManager.UpdateAvailableMemory - 16185 total (0 active), Execution time: mean = 5.506 us, total = 89.122 ms, Queueing time: mean = 97.835 us, max = 802.131 us, min = 3.103 us, total = 1.583 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 8098 total (1 active), Execution time: mean = 17.388 us, total = 140.805 ms, Queueing time: mean = 68.502 us, max = 1.134 ms, min = 5.101 us, total = 554.728 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 6468 total (1 active), Execution time: mean = 442.903 us, total = 2.865 s, Queueing time: mean = 68.536 us, max = 1.438 ms, min = 6.986 us, total = 443.294 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1620 total (1 active), Execution time: mean = 8.284 us, total = 13.420 ms, Queueing time: mean = 172.352 us, max = 3.537 ms, min = 6.711 us, total = 279.210 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1620 total (1 active), Execution time: mean = 15.195 us, total = 24.616 ms, Queueing time: mean = 66.907 us, max = 2.658 ms, min = 8.343 us, total = 108.389 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1620 total (1 active), Execution time: mean = 3.236 us, total = 5.243 ms, Queueing time: mean = 175.571 us, max = 3.551 ms, min = 4.207 us, total = 284.425 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1619 total (0 active), Execution time: mean = 596.844 us, total = 966.290 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1619 total (0 active), Execution time: mean = 105.535 us, total = 170.861 ms, Queueing time: mean = 104.250 us, max = 901.407 us, min = 13.883 us, total = 168.780 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 541 total (1 active), Execution time: mean = 8.376 us, total = 4.531 ms, Queueing time: mean = 67.941 us, max = 496.804 us, min = 17.556 us, total = 36.756 ms [state-dump] NodeManager.deadline_timer.record_metrics - 324 total (1 active), Execution time: mean = 517.223 us, total = 167.580 ms, Queueing time: mean = 373.833 us, max = 1.908 ms, min = 8.783 us, total = 121.122 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 324 total (0 active), Execution time: mean = 1.359 ms, total = 440.225 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 324 total (1 active), Execution time: mean = 272.480 us, total = 88.283 ms, Queueing time: mean = 617.705 us, max = 2.199 ms, min = 5.323 us, total = 200.136 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 324 total (0 active), Execution time: mean = 48.515 us, total = 15.719 ms, Queueing time: mean = 98.903 us, max = 210.294 us, min = 6.906 us, total = 32.045 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 162 total (1 active), Execution time: mean = 1.710 ms, total = 276.964 ms, Queueing time: mean = 67.323 us, max = 174.696 us, min = 11.175 us, total = 10.906 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 27 total (1 active, 1 running), Execution time: mean = 2.588 ms, total = 69.885 ms, Queueing time: mean = 60.216 us, max = 172.215 us, min = 17.325 us, total = 1.626 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.688 s, total = 1198.753 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 333.624 us, total = 1.001 ms, Queueing time: mean = 88.799 us, max = 184.802 us, min = 20.320 us, total = 266.398 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 3.715 us, total = 7.430 us, Queueing time: mean = 28.725 us, max = 57.450 us, min = 57.450 us, total = 57.450 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:40:16,205 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:40:16,289 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 146926 total (35 active) [state-dump] Queueing time: mean = 24.273 ms, max = 590.169 s, min = -0.000 s, total = 3566.361 s [state-dump] Execution time: mean = 8.326 ms, total = 1223.377 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 35259 total (0 active), Execution time: mean = 35.180 us, total = 1.240 s, Queueing time: mean = 104.064 us, max = 23.460 ms, min = 2.263 us, total = 3.669 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 35259 total (0 active), Execution time: mean = 495.147 us, total = 17.458 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 16786 total (1 active), Execution time: mean = 10.482 us, total = 175.955 ms, Queueing time: mean = 83.177 us, max = 4.503 ms, min = -0.000 s, total = 1.396 s [state-dump] NodeManager.CheckGC - 16786 total (1 active), Execution time: mean = 3.200 us, total = 53.718 ms, Queueing time: mean = 89.500 us, max = 4.512 ms, min = 3.126 us, total = 1.502 s [state-dump] ObjectManager.UpdateAvailableMemory - 16785 total (0 active), Execution time: mean = 5.535 us, total = 92.898 ms, Queueing time: mean = 98.507 us, max = 1.104 ms, min = 3.103 us, total = 1.653 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 8397 total (1 active), Execution time: mean = 17.438 us, total = 146.428 ms, Queueing time: mean = 68.855 us, max = 1.134 ms, min = 5.101 us, total = 578.173 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 6707 total (1 active), Execution time: mean = 443.128 us, total = 2.972 s, Queueing time: mean = 68.764 us, max = 1.438 ms, min = 3.140 us, total = 461.203 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1680 total (1 active), Execution time: mean = 8.300 us, total = 13.943 ms, Queueing time: mean = 173.133 us, max = 3.537 ms, min = 6.711 us, total = 290.863 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1680 total (1 active), Execution time: mean = 15.269 us, total = 25.651 ms, Queueing time: mean = 67.085 us, max = 2.658 ms, min = 8.343 us, total = 112.703 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1680 total (1 active), Execution time: mean = 3.244 us, total = 5.451 ms, Queueing time: mean = 176.359 us, max = 3.551 ms, min = 4.207 us, total = 296.283 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1679 total (0 active), Execution time: mean = 598.350 us, total = 1.005 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1679 total (0 active), Execution time: mean = 105.694 us, total = 177.460 ms, Queueing time: mean = 104.290 us, max = 901.407 us, min = 13.883 us, total = 175.103 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 561 total (1 active), Execution time: mean = 8.387 us, total = 4.705 ms, Queueing time: mean = 68.084 us, max = 496.804 us, min = 17.556 us, total = 38.195 ms [state-dump] NodeManager.deadline_timer.record_metrics - 336 total (1 active), Execution time: mean = 517.809 us, total = 173.984 ms, Queueing time: mean = 377.308 us, max = 1.908 ms, min = 8.783 us, total = 126.776 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 336 total (0 active), Execution time: mean = 1.369 ms, total = 459.900 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 336 total (1 active), Execution time: mean = 275.243 us, total = 92.482 ms, Queueing time: mean = 619.014 us, max = 2.199 ms, min = 5.323 us, total = 207.989 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 336 total (0 active), Execution time: mean = 48.983 us, total = 16.458 ms, Queueing time: mean = 99.443 us, max = 210.294 us, min = 6.906 us, total = 33.413 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 168 total (1 active), Execution time: mean = 1.717 ms, total = 288.524 ms, Queueing time: mean = 68.089 us, max = 174.696 us, min = 11.175 us, total = 11.439 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 28 total (1 active, 1 running), Execution time: mean = 2.590 ms, total = 72.533 ms, Queueing time: mean = 59.977 us, max = 172.215 us, min = 17.325 us, total = 1.679 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.688 s, total = 1198.753 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 333.624 us, total = 1.001 ms, Queueing time: mean = 88.799 us, max = 184.802 us, min = 20.320 us, total = 266.398 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 3.715 us, total = 7.430 us, Queueing time: mean = 28.725 us, max = 57.450 us, min = 57.450 us, total = 57.450 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:41:16,205 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:41:16,291 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 152158 total (35 active) [state-dump] Queueing time: mean = 23.441 ms, max = 590.169 s, min = -0.000 s, total = 3566.743 s [state-dump] Execution time: mean = 8.046 ms, total = 1224.248 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 36519 total (0 active), Execution time: mean = 35.068 us, total = 1.281 s, Queueing time: mean = 104.098 us, max = 23.460 ms, min = 2.263 us, total = 3.802 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 36519 total (0 active), Execution time: mean = 495.168 us, total = 18.083 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 17385 total (1 active), Execution time: mean = 10.448 us, total = 181.645 ms, Queueing time: mean = 83.465 us, max = 4.503 ms, min = -0.000 s, total = 1.451 s [state-dump] NodeManager.CheckGC - 17385 total (1 active), Execution time: mean = 3.190 us, total = 55.456 ms, Queueing time: mean = 89.766 us, max = 4.512 ms, min = 3.126 us, total = 1.561 s [state-dump] ObjectManager.UpdateAvailableMemory - 17384 total (0 active), Execution time: mean = 5.522 us, total = 96.002 ms, Queueing time: mean = 98.348 us, max = 1.104 ms, min = 2.197 us, total = 1.710 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 8697 total (1 active), Execution time: mean = 17.383 us, total = 151.183 ms, Queueing time: mean = 68.777 us, max = 1.134 ms, min = 5.101 us, total = 598.156 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 6947 total (1 active), Execution time: mean = 442.847 us, total = 3.076 s, Queueing time: mean = 68.690 us, max = 1.438 ms, min = 3.140 us, total = 477.188 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1740 total (1 active), Execution time: mean = 8.279 us, total = 14.405 ms, Queueing time: mean = 173.056 us, max = 3.537 ms, min = 6.711 us, total = 301.117 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1740 total (1 active), Execution time: mean = 15.288 us, total = 26.602 ms, Queueing time: mean = 67.074 us, max = 2.658 ms, min = 8.343 us, total = 116.708 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1740 total (1 active), Execution time: mean = 3.240 us, total = 5.638 ms, Queueing time: mean = 176.269 us, max = 3.551 ms, min = 4.207 us, total = 306.709 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1739 total (0 active), Execution time: mean = 598.332 us, total = 1.040 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1739 total (0 active), Execution time: mean = 105.669 us, total = 183.758 ms, Queueing time: mean = 104.018 us, max = 901.407 us, min = 13.883 us, total = 180.888 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 581 total (1 active), Execution time: mean = 8.378 us, total = 4.868 ms, Queueing time: mean = 68.175 us, max = 496.804 us, min = 17.376 us, total = 39.610 ms [state-dump] NodeManager.deadline_timer.record_metrics - 348 total (1 active), Execution time: mean = 516.953 us, total = 179.900 ms, Queueing time: mean = 378.072 us, max = 1.908 ms, min = 8.783 us, total = 131.569 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 348 total (0 active), Execution time: mean = 1.375 ms, total = 478.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 348 total (1 active), Execution time: mean = 277.278 us, total = 96.493 ms, Queueing time: mean = 616.780 us, max = 2.199 ms, min = 5.323 us, total = 214.640 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 348 total (0 active), Execution time: mean = 49.257 us, total = 17.141 ms, Queueing time: mean = 99.799 us, max = 210.294 us, min = 6.906 us, total = 34.730 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 174 total (1 active), Execution time: mean = 1.720 ms, total = 299.212 ms, Queueing time: mean = 67.731 us, max = 174.696 us, min = 11.175 us, total = 11.785 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 29 total (1 active, 1 running), Execution time: mean = 2.604 ms, total = 75.506 ms, Queueing time: mean = 61.069 us, max = 172.215 us, min = 17.325 us, total = 1.771 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.688 s, total = 1198.753 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 333.624 us, total = 1.001 ms, Queueing time: mean = 88.799 us, max = 184.802 us, min = 20.320 us, total = 266.398 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 3.715 us, total = 7.430 us, Queueing time: mean = 28.725 us, max = 57.450 us, min = 57.450 us, total = 57.450 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 0 [state-dump] [state-dump] [2025-01-21 05:42:16,205 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:42:16,294 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 157395 total (35 active) [state-dump] Queueing time: mean = 22.663 ms, max = 590.169 s, min = -0.000 s, total = 3567.040 s [state-dump] Execution time: mean = 11.595 ms, total = 1824.962 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 37779 total (0 active), Execution time: mean = 34.766 us, total = 1.313 s, Queueing time: mean = 103.193 us, max = 23.460 ms, min = 2.263 us, total = 3.899 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 37779 total (0 active), Execution time: mean = 491.730 us, total = 18.577 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 17985 total (1 active), Execution time: mean = 10.378 us, total = 186.641 ms, Queueing time: mean = 83.058 us, max = 4.503 ms, min = -0.000 s, total = 1.494 s [state-dump] NodeManager.CheckGC - 17985 total (1 active), Execution time: mean = 3.181 us, total = 57.218 ms, Queueing time: mean = 89.303 us, max = 4.512 ms, min = 3.126 us, total = 1.606 s [state-dump] ObjectManager.UpdateAvailableMemory - 17984 total (0 active), Execution time: mean = 5.483 us, total = 98.601 ms, Queueing time: mean = 97.333 us, max = 1.104 ms, min = 2.197 us, total = 1.750 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 8997 total (1 active), Execution time: mean = 17.265 us, total = 155.337 ms, Queueing time: mean = 68.191 us, max = 1.134 ms, min = 5.101 us, total = 613.519 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 7186 total (1 active), Execution time: mean = 441.732 us, total = 3.174 s, Queueing time: mean = 68.084 us, max = 1.438 ms, min = 3.140 us, total = 489.253 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1800 total (1 active), Execution time: mean = 8.254 us, total = 14.858 ms, Queueing time: mean = 172.988 us, max = 3.537 ms, min = 2.019 us, total = 311.378 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1800 total (1 active), Execution time: mean = 15.184 us, total = 27.330 ms, Queueing time: mean = 66.378 us, max = 2.658 ms, min = 8.343 us, total = 119.481 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1800 total (1 active), Execution time: mean = 3.226 us, total = 5.807 ms, Queueing time: mean = 176.198 us, max = 3.551 ms, min = 4.207 us, total = 317.156 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1799 total (0 active), Execution time: mean = 594.684 us, total = 1.070 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1799 total (0 active), Execution time: mean = 105.190 us, total = 189.237 ms, Queueing time: mean = 102.984 us, max = 901.407 us, min = 13.883 us, total = 185.269 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 601 total (1 active), Execution time: mean = 8.330 us, total = 5.007 ms, Queueing time: mean = 68.094 us, max = 496.804 us, min = 12.362 us, total = 40.925 ms [state-dump] NodeManager.deadline_timer.record_metrics - 360 total (1 active), Execution time: mean = 515.212 us, total = 185.476 ms, Queueing time: mean = 379.596 us, max = 1.908 ms, min = 8.783 us, total = 136.655 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 360 total (0 active), Execution time: mean = 1.374 ms, total = 494.511 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 360 total (1 active), Execution time: mean = 277.975 us, total = 100.071 ms, Queueing time: mean = 615.842 us, max = 2.199 ms, min = 5.323 us, total = 221.703 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 360 total (0 active), Execution time: mean = 49.099 us, total = 17.676 ms, Queueing time: mean = 99.097 us, max = 210.294 us, min = 6.906 us, total = 35.675 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 180 total (1 active), Execution time: mean = 1.720 ms, total = 309.597 ms, Queueing time: mean = 67.387 us, max = 174.696 us, min = 11.175 us, total = 12.130 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 30 total (1 active, 1 running), Execution time: mean = 2.596 ms, total = 77.871 ms, Queueing time: mean = 60.243 us, max = 172.215 us, min = 17.325 us, total = 1.807 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.751 s, total = 1798.753 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 345.190 us, total = 1.381 ms, Queueing time: mean = 75.774 us, max = 184.802 us, min = 20.320 us, total = 303.095 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 4.766 us, total = 14.297 us, Queueing time: mean = 30.682 us, max = 57.450 us, min = 34.595 us, total = 92.045 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:43:16,205 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:43:16,297 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 162627 total (35 active) [state-dump] Queueing time: mean = 21.936 ms, max = 590.169 s, min = -0.000 s, total = 3567.396 s [state-dump] Execution time: mean = 11.227 ms, total = 1825.815 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 39039 total (0 active), Execution time: mean = 34.674 us, total = 1.354 s, Queueing time: mean = 103.020 us, max = 23.460 ms, min = 2.263 us, total = 4.022 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 39039 total (0 active), Execution time: mean = 491.536 us, total = 19.189 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 18584 total (1 active), Execution time: mean = 10.298 us, total = 191.382 ms, Queueing time: mean = 82.767 us, max = 4.503 ms, min = -0.000 s, total = 1.538 s [state-dump] NodeManager.CheckGC - 18584 total (1 active), Execution time: mean = 3.170 us, total = 58.913 ms, Queueing time: mean = 88.949 us, max = 4.512 ms, min = 3.126 us, total = 1.653 s [state-dump] ObjectManager.UpdateAvailableMemory - 18583 total (0 active), Execution time: mean = 5.457 us, total = 101.412 ms, Queueing time: mean = 97.065 us, max = 1.104 ms, min = 2.197 us, total = 1.804 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 9297 total (1 active), Execution time: mean = 17.170 us, total = 159.632 ms, Queueing time: mean = 67.976 us, max = 1.134 ms, min = 5.101 us, total = 631.977 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 7426 total (1 active), Execution time: mean = 441.149 us, total = 3.276 s, Queueing time: mean = 68.110 us, max = 1.438 ms, min = 3.140 us, total = 505.782 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1860 total (1 active), Execution time: mean = 8.251 us, total = 15.347 ms, Queueing time: mean = 174.447 us, max = 3.537 ms, min = 2.019 us, total = 324.472 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1860 total (1 active), Execution time: mean = 15.158 us, total = 28.195 ms, Queueing time: mean = 66.352 us, max = 2.658 ms, min = 8.343 us, total = 123.415 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1860 total (1 active), Execution time: mean = 3.229 us, total = 6.005 ms, Queueing time: mean = 177.654 us, max = 3.551 ms, min = 4.207 us, total = 330.436 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1859 total (0 active), Execution time: mean = 594.092 us, total = 1.104 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1859 total (0 active), Execution time: mean = 105.015 us, total = 195.224 ms, Queueing time: mean = 102.755 us, max = 901.407 us, min = 13.883 us, total = 191.021 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 621 total (1 active), Execution time: mean = 8.309 us, total = 5.160 ms, Queueing time: mean = 68.056 us, max = 496.804 us, min = 12.362 us, total = 42.263 ms [state-dump] NodeManager.deadline_timer.record_metrics - 372 total (1 active), Execution time: mean = 520.245 us, total = 193.531 ms, Queueing time: mean = 381.006 us, max = 1.908 ms, min = 8.783 us, total = 141.734 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 372 total (0 active), Execution time: mean = 1.373 ms, total = 510.847 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 372 total (1 active), Execution time: mean = 279.253 us, total = 103.882 ms, Queueing time: mean = 621.620 us, max = 2.199 ms, min = 5.323 us, total = 231.242 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 372 total (0 active), Execution time: mean = 49.141 us, total = 18.281 ms, Queueing time: mean = 99.012 us, max = 210.294 us, min = 6.906 us, total = 36.832 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 186 total (1 active), Execution time: mean = 1.723 ms, total = 320.538 ms, Queueing time: mean = 67.221 us, max = 174.696 us, min = 11.175 us, total = 12.503 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 31 total (1 active, 1 running), Execution time: mean = 2.603 ms, total = 80.708 ms, Queueing time: mean = 59.333 us, max = 172.215 us, min = 17.325 us, total = 1.839 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.751 s, total = 1798.753 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 345.190 us, total = 1.381 ms, Queueing time: mean = 75.774 us, max = 184.802 us, min = 20.320 us, total = 303.095 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 4.766 us, total = 14.297 us, Queueing time: mean = 30.682 us, max = 57.450 us, min = 34.595 us, total = 92.045 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:44:16,206 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:44:16,299 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 167861 total (35 active) [state-dump] Queueing time: mean = 21.254 ms, max = 590.169 s, min = -0.000 s, total = 3567.793 s [state-dump] Execution time: mean = 10.883 ms, total = 1826.750 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 40299 total (0 active), Execution time: mean = 34.714 us, total = 1.399 s, Queueing time: mean = 103.583 us, max = 23.460 ms, min = 2.263 us, total = 4.174 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 40299 total (0 active), Execution time: mean = 492.999 us, total = 19.867 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 19184 total (1 active), Execution time: mean = 10.268 us, total = 196.977 ms, Queueing time: mean = 82.892 us, max = 4.503 ms, min = -0.000 s, total = 1.590 s [state-dump] NodeManager.CheckGC - 19184 total (1 active), Execution time: mean = 3.166 us, total = 60.730 ms, Queueing time: mean = 89.047 us, max = 4.512 ms, min = 3.126 us, total = 1.708 s [state-dump] ObjectManager.UpdateAvailableMemory - 19183 total (0 active), Execution time: mean = 5.460 us, total = 104.741 ms, Queueing time: mean = 97.210 us, max = 1.104 ms, min = 2.197 us, total = 1.865 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 9597 total (1 active), Execution time: mean = 17.149 us, total = 164.577 ms, Queueing time: mean = 68.065 us, max = 1.134 ms, min = 5.101 us, total = 653.222 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 7665 total (1 active), Execution time: mean = 441.369 us, total = 3.383 s, Queueing time: mean = 68.187 us, max = 1.438 ms, min = 3.140 us, total = 522.650 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1920 total (1 active), Execution time: mean = 8.245 us, total = 15.830 ms, Queueing time: mean = 173.328 us, max = 3.537 ms, min = 2.019 us, total = 332.790 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1920 total (1 active), Execution time: mean = 15.173 us, total = 29.132 ms, Queueing time: mean = 66.277 us, max = 2.658 ms, min = 8.343 us, total = 127.252 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1920 total (1 active), Execution time: mean = 3.244 us, total = 6.228 ms, Queueing time: mean = 176.535 us, max = 3.551 ms, min = 4.207 us, total = 338.948 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1919 total (0 active), Execution time: mean = 595.181 us, total = 1.142 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1919 total (0 active), Execution time: mean = 105.010 us, total = 201.514 ms, Queueing time: mean = 103.035 us, max = 901.407 us, min = 13.883 us, total = 197.725 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 641 total (1 active), Execution time: mean = 8.310 us, total = 5.327 ms, Queueing time: mean = 68.033 us, max = 496.804 us, min = 12.362 us, total = 43.609 ms [state-dump] NodeManager.deadline_timer.record_metrics - 384 total (1 active), Execution time: mean = 521.445 us, total = 200.235 ms, Queueing time: mean = 374.986 us, max = 1.908 ms, min = 8.783 us, total = 143.994 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 384 total (0 active), Execution time: mean = 1.380 ms, total = 530.031 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 384 total (1 active), Execution time: mean = 280.844 us, total = 107.844 ms, Queueing time: mean = 614.644 us, max = 2.199 ms, min = 5.323 us, total = 236.023 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 384 total (0 active), Execution time: mean = 49.290 us, total = 18.927 ms, Queueing time: mean = 99.578 us, max = 210.294 us, min = 6.906 us, total = 38.238 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 192 total (1 active), Execution time: mean = 1.723 ms, total = 330.794 ms, Queueing time: mean = 67.222 us, max = 174.696 us, min = 11.175 us, total = 12.907 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 32 total (1 active, 1 running), Execution time: mean = 2.607 ms, total = 83.428 ms, Queueing time: mean = 59.865 us, max = 172.215 us, min = 17.325 us, total = 1.916 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.751 s, total = 1798.753 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 345.190 us, total = 1.381 ms, Queueing time: mean = 75.774 us, max = 184.802 us, min = 20.320 us, total = 303.095 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 4.766 us, total = 14.297 us, Queueing time: mean = 30.682 us, max = 57.450 us, min = 34.595 us, total = 92.045 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:45:16,206 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:45:16,302 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 173093 total (35 active) [state-dump] Queueing time: mean = 20.614 ms, max = 590.169 s, min = -0.000 s, total = 3568.198 s [state-dump] Execution time: mean = 10.559 ms, total = 1827.651 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 41559 total (0 active), Execution time: mean = 34.716 us, total = 1.443 s, Queueing time: mean = 103.972 us, max = 23.460 ms, min = 2.263 us, total = 4.321 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 41559 total (0 active), Execution time: mean = 493.708 us, total = 20.518 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 19783 total (1 active), Execution time: mean = 10.245 us, total = 202.669 ms, Queueing time: mean = 82.924 us, max = 4.503 ms, min = -0.000 s, total = 1.640 s [state-dump] NodeManager.CheckGC - 19783 total (1 active), Execution time: mean = 3.163 us, total = 62.568 ms, Queueing time: mean = 89.062 us, max = 4.512 ms, min = 3.126 us, total = 1.762 s [state-dump] ObjectManager.UpdateAvailableMemory - 19782 total (0 active), Execution time: mean = 5.468 us, total = 108.177 ms, Queueing time: mean = 97.630 us, max = 1.104 ms, min = 2.197 us, total = 1.931 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 9897 total (1 active), Execution time: mean = 17.108 us, total = 169.317 ms, Queueing time: mean = 68.017 us, max = 1.134 ms, min = 5.101 us, total = 673.165 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 7905 total (1 active), Execution time: mean = 441.166 us, total = 3.487 s, Queueing time: mean = 68.405 us, max = 1.438 ms, min = 3.140 us, total = 540.743 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1980 total (1 active), Execution time: mean = 8.283 us, total = 16.401 ms, Queueing time: mean = 173.778 us, max = 3.537 ms, min = 2.019 us, total = 344.081 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1980 total (1 active), Execution time: mean = 15.182 us, total = 30.060 ms, Queueing time: mean = 66.490 us, max = 2.658 ms, min = 8.343 us, total = 131.650 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1980 total (1 active), Execution time: mean = 3.241 us, total = 6.418 ms, Queueing time: mean = 177.013 us, max = 3.551 ms, min = 4.207 us, total = 350.487 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1979 total (0 active), Execution time: mean = 595.605 us, total = 1.179 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1979 total (0 active), Execution time: mean = 104.905 us, total = 207.607 ms, Queueing time: mean = 103.372 us, max = 901.407 us, min = 13.883 us, total = 204.574 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 661 total (1 active), Execution time: mean = 8.366 us, total = 5.530 ms, Queueing time: mean = 68.160 us, max = 496.804 us, min = 12.362 us, total = 45.054 ms [state-dump] NodeManager.deadline_timer.record_metrics - 396 total (1 active), Execution time: mean = 521.588 us, total = 206.549 ms, Queueing time: mean = 376.663 us, max = 1.908 ms, min = 8.783 us, total = 149.159 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 396 total (0 active), Execution time: mean = 1.384 ms, total = 547.920 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 396 total (1 active), Execution time: mean = 282.281 us, total = 111.783 ms, Queueing time: mean = 615.242 us, max = 2.199 ms, min = 5.323 us, total = 243.636 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 396 total (0 active), Execution time: mean = 49.478 us, total = 19.593 ms, Queueing time: mean = 99.780 us, max = 210.294 us, min = 6.906 us, total = 39.513 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 198 total (1 active), Execution time: mean = 1.725 ms, total = 341.460 ms, Queueing time: mean = 67.670 us, max = 174.696 us, min = 11.175 us, total = 13.399 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 33 total (1 active, 1 running), Execution time: mean = 2.607 ms, total = 86.042 ms, Queueing time: mean = 59.857 us, max = 172.215 us, min = 17.325 us, total = 1.975 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.751 s, total = 1798.753 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 345.190 us, total = 1.381 ms, Queueing time: mean = 75.774 us, max = 184.802 us, min = 20.320 us, total = 303.095 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 4.766 us, total = 14.297 us, Queueing time: mean = 30.682 us, max = 57.450 us, min = 34.595 us, total = 92.045 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 05:46:16,206 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:46:16,305 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 178327 total (35 active) [state-dump] Queueing time: mean = 20.011 ms, max = 590.169 s, min = -0.000 s, total = 3568.509 s [state-dump] Execution time: mean = 10.253 ms, total = 1828.398 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 42819 total (0 active), Execution time: mean = 34.515 us, total = 1.478 s, Queueing time: mean = 103.406 us, max = 23.460 ms, min = 2.004 us, total = 4.428 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 42819 total (0 active), Execution time: mean = 491.227 us, total = 21.034 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 20383 total (1 active), Execution time: mean = 10.198 us, total = 207.876 ms, Queueing time: mean = 82.596 us, max = 4.503 ms, min = -0.000 s, total = 1.684 s [state-dump] NodeManager.CheckGC - 20383 total (1 active), Execution time: mean = 3.155 us, total = 64.313 ms, Queueing time: mean = 88.699 us, max = 4.512 ms, min = 3.126 us, total = 1.808 s [state-dump] ObjectManager.UpdateAvailableMemory - 20382 total (0 active), Execution time: mean = 5.444 us, total = 110.966 ms, Queueing time: mean = 96.978 us, max = 1.104 ms, min = 2.197 us, total = 1.977 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 10197 total (1 active), Execution time: mean = 17.066 us, total = 174.017 ms, Queueing time: mean = 67.799 us, max = 1.134 ms, min = 5.101 us, total = 691.342 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 8144 total (1 active), Execution time: mean = 440.945 us, total = 3.591 s, Queueing time: mean = 68.079 us, max = 1.438 ms, min = 3.140 us, total = 554.432 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2040 total (1 active), Execution time: mean = 8.253 us, total = 16.835 ms, Queueing time: mean = 173.045 us, max = 3.537 ms, min = 2.019 us, total = 353.011 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2040 total (1 active), Execution time: mean = 15.134 us, total = 30.873 ms, Queueing time: mean = 66.102 us, max = 2.658 ms, min = 8.343 us, total = 134.848 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2040 total (1 active), Execution time: mean = 3.232 us, total = 6.594 ms, Queueing time: mean = 176.266 us, max = 3.551 ms, min = 4.207 us, total = 359.582 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2039 total (0 active), Execution time: mean = 593.437 us, total = 1.210 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2039 total (0 active), Execution time: mean = 104.681 us, total = 213.444 ms, Queueing time: mean = 102.697 us, max = 901.407 us, min = 13.883 us, total = 209.400 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 681 total (1 active), Execution time: mean = 8.314 us, total = 5.662 ms, Queueing time: mean = 67.582 us, max = 496.804 us, min = 12.362 us, total = 46.023 ms [state-dump] NodeManager.deadline_timer.record_metrics - 408 total (1 active), Execution time: mean = 520.421 us, total = 212.332 ms, Queueing time: mean = 374.798 us, max = 1.908 ms, min = 8.783 us, total = 152.918 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 408 total (0 active), Execution time: mean = 1.384 ms, total = 564.574 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 408 total (1 active), Execution time: mean = 282.619 us, total = 115.309 ms, Queueing time: mean = 611.671 us, max = 2.199 ms, min = 5.323 us, total = 249.562 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 408 total (0 active), Execution time: mean = 49.343 us, total = 20.132 ms, Queueing time: mean = 99.089 us, max = 210.294 us, min = 6.906 us, total = 40.428 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 204 total (1 active), Execution time: mean = 1.721 ms, total = 351.132 ms, Queueing time: mean = 67.076 us, max = 174.696 us, min = 11.175 us, total = 13.684 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 34 total (1 active, 1 running), Execution time: mean = 2.616 ms, total = 88.932 ms, Queueing time: mean = 60.095 us, max = 172.215 us, min = 17.325 us, total = 2.043 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.751 s, total = 1798.753 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 345.190 us, total = 1.381 ms, Queueing time: mean = 75.774 us, max = 184.802 us, min = 20.320 us, total = 303.095 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 4.766 us, total = 14.297 us, Queueing time: mean = 30.682 us, max = 57.450 us, min = 34.595 us, total = 92.045 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 05:47:16,207 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:47:16,308 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 183559 total (35 active) [state-dump] Queueing time: mean = 19.443 ms, max = 590.169 s, min = -0.000 s, total = 3568.883 s [state-dump] Execution time: mean = 9.966 ms, total = 1829.263 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 44079 total (0 active), Execution time: mean = 34.428 us, total = 1.518 s, Queueing time: mean = 103.629 us, max = 23.460 ms, min = 2.004 us, total = 4.568 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 44079 total (0 active), Execution time: mean = 491.545 us, total = 21.667 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 20982 total (1 active), Execution time: mean = 10.146 us, total = 212.889 ms, Queueing time: mean = 82.421 us, max = 4.503 ms, min = -0.000 s, total = 1.729 s [state-dump] NodeManager.CheckGC - 20982 total (1 active), Execution time: mean = 3.145 us, total = 65.992 ms, Queueing time: mean = 88.483 us, max = 4.512 ms, min = 3.126 us, total = 1.857 s [state-dump] ObjectManager.UpdateAvailableMemory - 20981 total (0 active), Execution time: mean = 5.430 us, total = 113.919 ms, Queueing time: mean = 97.198 us, max = 1.104 ms, min = 2.197 us, total = 2.039 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 10497 total (1 active), Execution time: mean = 16.992 us, total = 178.364 ms, Queueing time: mean = 67.804 us, max = 1.134 ms, min = 5.101 us, total = 711.741 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 8384 total (1 active), Execution time: mean = 440.015 us, total = 3.689 s, Queueing time: mean = 67.924 us, max = 1.438 ms, min = 3.140 us, total = 569.479 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2100 total (1 active), Execution time: mean = 8.247 us, total = 17.319 ms, Queueing time: mean = 172.382 us, max = 3.537 ms, min = 2.019 us, total = 362.003 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2100 total (1 active), Execution time: mean = 15.086 us, total = 31.680 ms, Queueing time: mean = 65.930 us, max = 2.658 ms, min = 7.553 us, total = 138.453 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2100 total (1 active), Execution time: mean = 3.227 us, total = 6.776 ms, Queueing time: mean = 175.608 us, max = 3.551 ms, min = 4.207 us, total = 368.777 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2099 total (0 active), Execution time: mean = 593.724 us, total = 1.246 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2099 total (0 active), Execution time: mean = 104.554 us, total = 219.460 ms, Queueing time: mean = 103.034 us, max = 901.407 us, min = 13.883 us, total = 216.269 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 701 total (1 active), Execution time: mean = 8.278 us, total = 5.803 ms, Queueing time: mean = 67.451 us, max = 496.804 us, min = 12.362 us, total = 47.283 ms [state-dump] NodeManager.deadline_timer.record_metrics - 420 total (1 active), Execution time: mean = 518.061 us, total = 217.586 ms, Queueing time: mean = 373.276 us, max = 1.908 ms, min = 8.783 us, total = 156.776 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 420 total (0 active), Execution time: mean = 1.382 ms, total = 580.538 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 420 total (1 active), Execution time: mean = 282.419 us, total = 118.616 ms, Queueing time: mean = 608.273 us, max = 2.199 ms, min = 5.323 us, total = 255.475 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 420 total (0 active), Execution time: mean = 49.313 us, total = 20.711 ms, Queueing time: mean = 99.070 us, max = 210.294 us, min = 6.906 us, total = 41.609 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 210 total (1 active), Execution time: mean = 1.713 ms, total = 359.810 ms, Queueing time: mean = 66.662 us, max = 174.696 us, min = 11.175 us, total = 13.999 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 35 total (1 active, 1 running), Execution time: mean = 2.623 ms, total = 91.816 ms, Queueing time: mean = 60.201 us, max = 172.215 us, min = 17.325 us, total = 2.107 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.751 s, total = 1798.753 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 345.190 us, total = 1.381 ms, Queueing time: mean = 75.774 us, max = 184.802 us, min = 20.320 us, total = 303.095 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 4.766 us, total = 14.297 us, Queueing time: mean = 30.682 us, max = 57.450 us, min = 34.595 us, total = 92.045 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 05:48:16,207 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:48:16,311 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 188793 total (35 active) [state-dump] Queueing time: mean = 18.906 ms, max = 590.169 s, min = -0.000 s, total = 3569.321 s [state-dump] Execution time: mean = 9.694 ms, total = 1830.195 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 45339 total (0 active), Execution time: mean = 34.479 us, total = 1.563 s, Queueing time: mean = 103.999 us, max = 23.460 ms, min = 2.000 us, total = 4.715 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 45339 total (0 active), Execution time: mean = 492.831 us, total = 22.344 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 21582 total (1 active), Execution time: mean = 10.149 us, total = 219.035 ms, Queueing time: mean = 83.010 us, max = 6.066 ms, min = -0.000 s, total = 1.792 s [state-dump] NodeManager.CheckGC - 21582 total (1 active), Execution time: mean = 3.145 us, total = 67.876 ms, Queueing time: mean = 89.075 us, max = 6.066 ms, min = 3.126 us, total = 1.922 s [state-dump] ObjectManager.UpdateAvailableMemory - 21581 total (0 active), Execution time: mean = 5.436 us, total = 117.313 ms, Queueing time: mean = 97.425 us, max = 1.104 ms, min = 2.197 us, total = 2.103 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 10797 total (1 active), Execution time: mean = 17.006 us, total = 183.611 ms, Queueing time: mean = 68.123 us, max = 1.134 ms, min = 5.101 us, total = 735.524 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 8623 total (1 active), Execution time: mean = 439.617 us, total = 3.791 s, Queueing time: mean = 69.545 us, max = 13.366 ms, min = 3.140 us, total = 599.683 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2160 total (1 active), Execution time: mean = 8.269 us, total = 17.860 ms, Queueing time: mean = 172.160 us, max = 3.537 ms, min = 2.019 us, total = 371.865 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2160 total (1 active), Execution time: mean = 15.140 us, total = 32.702 ms, Queueing time: mean = 66.024 us, max = 2.658 ms, min = 7.553 us, total = 142.611 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2160 total (1 active), Execution time: mean = 3.228 us, total = 6.973 ms, Queueing time: mean = 175.397 us, max = 3.551 ms, min = 4.207 us, total = 378.857 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2159 total (0 active), Execution time: mean = 595.554 us, total = 1.286 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2159 total (0 active), Execution time: mean = 104.657 us, total = 225.954 ms, Queueing time: mean = 103.367 us, max = 901.407 us, min = 13.883 us, total = 223.168 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 721 total (1 active), Execution time: mean = 8.293 us, total = 5.979 ms, Queueing time: mean = 67.639 us, max = 496.804 us, min = 12.362 us, total = 48.768 ms [state-dump] NodeManager.deadline_timer.record_metrics - 432 total (1 active), Execution time: mean = 517.696 us, total = 223.645 ms, Queueing time: mean = 373.007 us, max = 1.908 ms, min = 8.783 us, total = 161.139 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 432 total (0 active), Execution time: mean = 1.386 ms, total = 598.878 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 432 total (1 active), Execution time: mean = 283.512 us, total = 122.477 ms, Queueing time: mean = 606.386 us, max = 2.199 ms, min = 5.323 us, total = 261.959 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 432 total (0 active), Execution time: mean = 49.518 us, total = 21.392 ms, Queueing time: mean = 100.354 us, max = 237.873 us, min = 6.906 us, total = 43.353 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 216 total (1 active), Execution time: mean = 1.713 ms, total = 369.918 ms, Queueing time: mean = 66.398 us, max = 174.696 us, min = 11.175 us, total = 14.342 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 36 total (1 active, 1 running), Execution time: mean = 2.632 ms, total = 94.762 ms, Queueing time: mean = 60.609 us, max = 172.215 us, min = 17.325 us, total = 2.182 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.751 s, total = 1798.753 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 345.190 us, total = 1.381 ms, Queueing time: mean = 75.774 us, max = 184.802 us, min = 20.320 us, total = 303.095 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 4.766 us, total = 14.297 us, Queueing time: mean = 30.682 us, max = 57.450 us, min = 34.595 us, total = 92.045 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 05:49:16,207 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:49:16,313 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 194024 total (35 active) [state-dump] Queueing time: mean = 18.398 ms, max = 590.169 s, min = -0.000 s, total = 3569.685 s [state-dump] Execution time: mean = 9.437 ms, total = 1831.026 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 46599 total (0 active), Execution time: mean = 34.397 us, total = 1.603 s, Queueing time: mean = 103.786 us, max = 23.460 ms, min = 2.000 us, total = 4.836 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 46599 total (0 active), Execution time: mean = 492.036 us, total = 22.928 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 22181 total (1 active), Execution time: mean = 10.139 us, total = 224.886 ms, Queueing time: mean = 82.919 us, max = 6.066 ms, min = -0.000 s, total = 1.839 s [state-dump] NodeManager.CheckGC - 22181 total (1 active), Execution time: mean = 3.147 us, total = 69.802 ms, Queueing time: mean = 88.975 us, max = 6.066 ms, min = 3.126 us, total = 1.974 s [state-dump] ObjectManager.UpdateAvailableMemory - 22180 total (0 active), Execution time: mean = 5.438 us, total = 120.618 ms, Queueing time: mean = 97.430 us, max = 1.104 ms, min = 2.197 us, total = 2.161 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 11096 total (1 active), Execution time: mean = 17.012 us, total = 188.766 ms, Queueing time: mean = 68.200 us, max = 1.134 ms, min = 5.101 us, total = 756.748 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 8863 total (1 active), Execution time: mean = 439.473 us, total = 3.895 s, Queueing time: mean = 69.582 us, max = 13.366 ms, min = 3.140 us, total = 616.708 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2220 total (1 active), Execution time: mean = 8.298 us, total = 18.421 ms, Queueing time: mean = 172.576 us, max = 3.537 ms, min = 2.019 us, total = 383.120 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2220 total (1 active), Execution time: mean = 15.151 us, total = 33.635 ms, Queueing time: mean = 65.734 us, max = 2.658 ms, min = 7.553 us, total = 145.929 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2220 total (1 active), Execution time: mean = 3.229 us, total = 7.169 ms, Queueing time: mean = 175.815 us, max = 3.551 ms, min = 4.207 us, total = 390.310 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2219 total (0 active), Execution time: mean = 595.513 us, total = 1.321 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2219 total (0 active), Execution time: mean = 104.695 us, total = 232.318 ms, Queueing time: mean = 103.254 us, max = 901.407 us, min = 13.883 us, total = 229.121 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 741 total (1 active), Execution time: mean = 8.313 us, total = 6.160 ms, Queueing time: mean = 67.604 us, max = 496.804 us, min = 12.362 us, total = 50.094 ms [state-dump] NodeManager.deadline_timer.record_metrics - 444 total (1 active), Execution time: mean = 518.649 us, total = 230.280 ms, Queueing time: mean = 373.694 us, max = 1.908 ms, min = 8.783 us, total = 165.920 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 444 total (0 active), Execution time: mean = 1.389 ms, total = 616.675 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 444 total (1 active), Execution time: mean = 284.640 us, total = 126.380 ms, Queueing time: mean = 606.926 us, max = 2.199 ms, min = 5.323 us, total = 269.475 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 444 total (0 active), Execution time: mean = 49.581 us, total = 22.014 ms, Queueing time: mean = 100.217 us, max = 237.873 us, min = 6.906 us, total = 44.496 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 222 total (1 active), Execution time: mean = 1.715 ms, total = 380.697 ms, Queueing time: mean = 66.266 us, max = 174.696 us, min = 11.175 us, total = 14.711 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 37 total (1 active, 1 running), Execution time: mean = 2.642 ms, total = 97.769 ms, Queueing time: mean = 60.652 us, max = 172.215 us, min = 17.325 us, total = 2.244 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.751 s, total = 1798.753 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 345.190 us, total = 1.381 ms, Queueing time: mean = 75.774 us, max = 184.802 us, min = 20.320 us, total = 303.095 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 4.766 us, total = 14.297 us, Queueing time: mean = 30.682 us, max = 57.450 us, min = 34.595 us, total = 92.045 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:50:16,208 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:50:16,314 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 199258 total (35 active) [state-dump] Queueing time: mean = 17.917 ms, max = 590.169 s, min = -0.000 s, total = 3570.036 s [state-dump] Execution time: mean = 9.193 ms, total = 1831.837 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 47859 total (0 active), Execution time: mean = 34.285 us, total = 1.641 s, Queueing time: mean = 103.542 us, max = 23.460 ms, min = 2.000 us, total = 4.955 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 47859 total (0 active), Execution time: mean = 491.025 us, total = 23.500 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 22781 total (1 active), Execution time: mean = 10.129 us, total = 230.745 ms, Queueing time: mean = 82.767 us, max = 6.066 ms, min = -0.000 s, total = 1.886 s [state-dump] NodeManager.CheckGC - 22781 total (1 active), Execution time: mean = 3.146 us, total = 71.674 ms, Queueing time: mean = 88.816 us, max = 6.066 ms, min = 3.126 us, total = 2.023 s [state-dump] ObjectManager.UpdateAvailableMemory - 22780 total (0 active), Execution time: mean = 5.434 us, total = 123.784 ms, Queueing time: mean = 97.314 us, max = 1.104 ms, min = 2.197 us, total = 2.217 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 11396 total (1 active), Execution time: mean = 17.030 us, total = 194.074 ms, Queueing time: mean = 68.388 us, max = 1.134 ms, min = 5.101 us, total = 779.352 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 9102 total (1 active), Execution time: mean = 439.352 us, total = 3.999 s, Queueing time: mean = 69.809 us, max = 13.366 ms, min = 3.140 us, total = 635.400 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2280 total (1 active), Execution time: mean = 8.298 us, total = 18.919 ms, Queueing time: mean = 171.911 us, max = 3.537 ms, min = 2.019 us, total = 391.956 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2280 total (1 active), Execution time: mean = 15.143 us, total = 34.526 ms, Queueing time: mean = 65.554 us, max = 2.658 ms, min = 7.553 us, total = 149.463 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2280 total (1 active), Execution time: mean = 3.225 us, total = 7.353 ms, Queueing time: mean = 175.153 us, max = 3.551 ms, min = 4.207 us, total = 399.348 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2279 total (0 active), Execution time: mean = 595.231 us, total = 1.357 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2279 total (0 active), Execution time: mean = 104.596 us, total = 238.373 ms, Queueing time: mean = 103.076 us, max = 901.407 us, min = 13.883 us, total = 234.910 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 761 total (1 active), Execution time: mean = 8.315 us, total = 6.328 ms, Queueing time: mean = 67.738 us, max = 496.804 us, min = 12.362 us, total = 51.549 ms [state-dump] NodeManager.deadline_timer.record_metrics - 456 total (1 active), Execution time: mean = 518.222 us, total = 236.309 ms, Queueing time: mean = 371.041 us, max = 1.908 ms, min = 8.783 us, total = 169.195 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 456 total (0 active), Execution time: mean = 1.391 ms, total = 634.518 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 456 total (1 active), Execution time: mean = 284.869 us, total = 129.900 ms, Queueing time: mean = 603.736 us, max = 2.199 ms, min = 5.323 us, total = 275.303 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 456 total (0 active), Execution time: mean = 49.548 us, total = 22.594 ms, Queueing time: mean = 100.037 us, max = 237.873 us, min = 6.906 us, total = 45.617 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 228 total (1 active), Execution time: mean = 1.709 ms, total = 389.563 ms, Queueing time: mean = 66.104 us, max = 174.696 us, min = 11.175 us, total = 15.072 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 38 total (1 active, 1 running), Execution time: mean = 2.607 ms, total = 99.076 ms, Queueing time: mean = 60.230 us, max = 172.215 us, min = 17.325 us, total = 2.289 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.751 s, total = 1798.753 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 345.190 us, total = 1.381 ms, Queueing time: mean = 75.774 us, max = 184.802 us, min = 20.320 us, total = 303.095 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 4.766 us, total = 14.297 us, Queueing time: mean = 30.682 us, max = 57.450 us, min = 34.595 us, total = 92.045 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 0 [state-dump] [state-dump] [2025-01-21 05:51:16,208 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:51:16,317 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 204490 total (35 active) [state-dump] Queueing time: mean = 17.459 ms, max = 590.169 s, min = -0.000 s, total = 3570.283 s [state-dump] Execution time: mean = 8.961 ms, total = 1832.453 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 49119 total (0 active), Execution time: mean = 34.017 us, total = 1.671 s, Queueing time: mean = 102.320 us, max = 23.460 ms, min = 2.000 us, total = 5.026 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 49119 total (0 active), Execution time: mean = 486.568 us, total = 23.900 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 23380 total (1 active), Execution time: mean = 10.093 us, total = 235.983 ms, Queueing time: mean = 82.319 us, max = 6.066 ms, min = -0.000 s, total = 1.925 s [state-dump] NodeManager.CheckGC - 23380 total (1 active), Execution time: mean = 3.143 us, total = 73.491 ms, Queueing time: mean = 88.340 us, max = 6.066 ms, min = 3.126 us, total = 2.065 s [state-dump] ObjectManager.UpdateAvailableMemory - 23379 total (0 active), Execution time: mean = 5.397 us, total = 126.181 ms, Queueing time: mean = 96.043 us, max = 1.104 ms, min = 2.197 us, total = 2.245 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 11696 total (1 active), Execution time: mean = 16.978 us, total = 198.571 ms, Queueing time: mean = 68.113 us, max = 1.134 ms, min = 5.101 us, total = 796.645 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 9342 total (1 active), Execution time: mean = 438.881 us, total = 4.100 s, Queueing time: mean = 69.612 us, max = 13.366 ms, min = 3.140 us, total = 650.312 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2340 total (1 active), Execution time: mean = 8.260 us, total = 19.329 ms, Queueing time: mean = 171.113 us, max = 3.537 ms, min = 2.019 us, total = 400.404 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2340 total (1 active), Execution time: mean = 15.102 us, total = 35.339 ms, Queueing time: mean = 65.291 us, max = 2.658 ms, min = 7.553 us, total = 152.781 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2340 total (1 active), Execution time: mean = 3.216 us, total = 7.525 ms, Queueing time: mean = 174.339 us, max = 3.551 ms, min = 4.207 us, total = 407.954 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2339 total (0 active), Execution time: mean = 591.599 us, total = 1.384 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2339 total (0 active), Execution time: mean = 104.377 us, total = 244.137 ms, Queueing time: mean = 101.979 us, max = 901.407 us, min = 13.883 us, total = 238.529 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 781 total (1 active), Execution time: mean = 8.316 us, total = 6.495 ms, Queueing time: mean = 67.438 us, max = 496.804 us, min = 12.362 us, total = 52.669 ms [state-dump] NodeManager.deadline_timer.record_metrics - 468 total (1 active), Execution time: mean = 518.361 us, total = 242.593 ms, Queueing time: mean = 367.459 us, max = 1.908 ms, min = 8.783 us, total = 171.971 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 468 total (0 active), Execution time: mean = 1.391 ms, total = 651.134 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 468 total (1 active), Execution time: mean = 285.214 us, total = 133.480 ms, Queueing time: mean = 599.874 us, max = 2.199 ms, min = 5.323 us, total = 280.741 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 468 total (0 active), Execution time: mean = 49.522 us, total = 23.176 ms, Queueing time: mean = 98.971 us, max = 237.873 us, min = 6.906 us, total = 46.318 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 234 total (1 active), Execution time: mean = 1.704 ms, total = 398.742 ms, Queueing time: mean = 65.207 us, max = 174.696 us, min = 11.175 us, total = 15.258 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 39 total (1 active, 1 running), Execution time: mean = 2.575 ms, total = 100.424 ms, Queueing time: mean = 59.105 us, max = 172.215 us, min = 16.335 us, total = 2.305 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.751 s, total = 1798.753 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 345.190 us, total = 1.381 ms, Queueing time: mean = 75.774 us, max = 184.802 us, min = 20.320 us, total = 303.095 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 4.766 us, total = 14.297 us, Queueing time: mean = 30.682 us, max = 57.450 us, min = 34.595 us, total = 92.045 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:52:16,208 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:52:16,320 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 209727 total (35 active) [state-dump] Queueing time: mean = 17.025 ms, max = 590.169 s, min = -0.000 s, total = 3570.604 s [state-dump] Execution time: mean = 11.602 ms, total = 2433.223 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 50379 total (0 active), Execution time: mean = 33.912 us, total = 1.708 s, Queueing time: mean = 101.933 us, max = 23.460 ms, min = 2.000 us, total = 5.135 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 50379 total (0 active), Execution time: mean = 485.015 us, total = 24.435 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 23980 total (1 active), Execution time: mean = 10.043 us, total = 240.840 ms, Queueing time: mean = 82.072 us, max = 6.066 ms, min = -0.000 s, total = 1.968 s [state-dump] NodeManager.CheckGC - 23980 total (1 active), Execution time: mean = 3.135 us, total = 75.175 ms, Queueing time: mean = 88.054 us, max = 6.066 ms, min = 3.126 us, total = 2.112 s [state-dump] ObjectManager.UpdateAvailableMemory - 23979 total (0 active), Execution time: mean = 5.370 us, total = 128.768 ms, Queueing time: mean = 95.535 us, max = 1.104 ms, min = 2.197 us, total = 2.291 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 11996 total (1 active), Execution time: mean = 16.901 us, total = 202.748 ms, Queueing time: mean = 67.815 us, max = 1.134 ms, min = 5.101 us, total = 813.504 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 9582 total (1 active), Execution time: mean = 438.371 us, total = 4.200 s, Queueing time: mean = 69.350 us, max = 13.366 ms, min = 3.140 us, total = 664.509 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2400 total (1 active), Execution time: mean = 8.257 us, total = 19.817 ms, Queueing time: mean = 171.323 us, max = 3.537 ms, min = 2.019 us, total = 411.176 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2400 total (1 active), Execution time: mean = 15.074 us, total = 36.178 ms, Queueing time: mean = 65.061 us, max = 2.658 ms, min = 7.553 us, total = 156.147 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2400 total (1 active), Execution time: mean = 3.224 us, total = 7.738 ms, Queueing time: mean = 174.538 us, max = 3.551 ms, min = 4.207 us, total = 418.891 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2399 total (0 active), Execution time: mean = 590.056 us, total = 1.416 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2399 total (0 active), Execution time: mean = 104.173 us, total = 249.910 ms, Queueing time: mean = 101.849 us, max = 901.407 us, min = 13.883 us, total = 244.336 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 801 total (1 active), Execution time: mean = 8.289 us, total = 6.639 ms, Queueing time: mean = 67.107 us, max = 496.804 us, min = 12.362 us, total = 53.753 ms [state-dump] NodeManager.deadline_timer.record_metrics - 480 total (1 active), Execution time: mean = 518.424 us, total = 248.844 ms, Queueing time: mean = 368.153 us, max = 1.908 ms, min = 8.783 us, total = 176.713 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 480 total (0 active), Execution time: mean = 1.392 ms, total = 668.396 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 480 total (1 active), Execution time: mean = 285.732 us, total = 137.151 ms, Queueing time: mean = 600.131 us, max = 2.199 ms, min = 5.323 us, total = 288.063 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 480 total (0 active), Execution time: mean = 49.565 us, total = 23.791 ms, Queueing time: mean = 98.733 us, max = 237.873 us, min = 6.906 us, total = 47.392 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 240 total (1 active), Execution time: mean = 1.706 ms, total = 409.334 ms, Queueing time: mean = 64.846 us, max = 174.696 us, min = 11.175 us, total = 15.563 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 40 total (1 active, 1 running), Execution time: mean = 2.581 ms, total = 103.246 ms, Queueing time: mean = 58.284 us, max = 172.215 us, min = 16.335 us, total = 2.331 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.793 s, total = 2398.756 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 339.443 us, total = 1.697 ms, Queueing time: mean = 94.919 us, max = 184.802 us, min = 20.320 us, total = 474.594 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 4.766 us, total = 14.297 us, Queueing time: mean = 30.682 us, max = 57.450 us, min = 34.595 us, total = 92.045 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:53:16,208 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:53:16,322 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 214958 total (35 active) [state-dump] Queueing time: mean = 16.613 ms, max = 590.169 s, min = -0.000 s, total = 3571.009 s [state-dump] Execution time: mean = 11.324 ms, total = 2434.159 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 51639 total (0 active), Execution time: mean = 33.977 us, total = 1.755 s, Queueing time: mean = 102.242 us, max = 23.460 ms, min = 2.000 us, total = 5.280 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 51639 total (0 active), Execution time: mean = 486.346 us, total = 25.114 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 24579 total (1 active), Execution time: mean = 10.015 us, total = 246.170 ms, Queueing time: mean = 82.130 us, max = 6.066 ms, min = -0.000 s, total = 2.019 s [state-dump] NodeManager.CheckGC - 24579 total (1 active), Execution time: mean = 3.131 us, total = 76.951 ms, Queueing time: mean = 88.088 us, max = 6.066 ms, min = 3.126 us, total = 2.165 s [state-dump] ObjectManager.UpdateAvailableMemory - 24578 total (0 active), Execution time: mean = 5.371 us, total = 132.009 ms, Queueing time: mean = 96.023 us, max = 1.104 ms, min = 2.197 us, total = 2.360 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 12296 total (1 active), Execution time: mean = 16.898 us, total = 207.783 ms, Queueing time: mean = 68.140 us, max = 1.134 ms, min = 5.101 us, total = 837.848 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 9821 total (1 active), Execution time: mean = 438.463 us, total = 4.306 s, Queueing time: mean = 69.480 us, max = 13.366 ms, min = 3.140 us, total = 682.368 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2460 total (1 active), Execution time: mean = 8.263 us, total = 20.326 ms, Queueing time: mean = 171.360 us, max = 3.537 ms, min = 2.019 us, total = 421.545 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2460 total (1 active), Execution time: mean = 15.098 us, total = 37.142 ms, Queueing time: mean = 65.136 us, max = 2.658 ms, min = 7.553 us, total = 160.236 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2460 total (1 active), Execution time: mean = 3.225 us, total = 7.934 ms, Queueing time: mean = 174.580 us, max = 3.551 ms, min = 4.207 us, total = 429.468 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2459 total (0 active), Execution time: mean = 590.737 us, total = 1.453 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2459 total (0 active), Execution time: mean = 104.137 us, total = 256.073 ms, Queueing time: mean = 101.865 us, max = 901.407 us, min = 13.883 us, total = 250.487 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 821 total (1 active), Execution time: mean = 8.285 us, total = 6.802 ms, Queueing time: mean = 67.149 us, max = 496.804 us, min = 12.362 us, total = 55.129 ms [state-dump] NodeManager.deadline_timer.record_metrics - 492 total (1 active), Execution time: mean = 518.992 us, total = 255.344 ms, Queueing time: mean = 367.714 us, max = 1.908 ms, min = 8.783 us, total = 180.915 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 492 total (0 active), Execution time: mean = 1.398 ms, total = 688.020 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 492 total (1 active), Execution time: mean = 286.969 us, total = 141.189 ms, Queueing time: mean = 599.078 us, max = 2.199 ms, min = 5.323 us, total = 294.746 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 492 total (0 active), Execution time: mean = 49.734 us, total = 24.469 ms, Queueing time: mean = 99.226 us, max = 237.873 us, min = 6.906 us, total = 48.819 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 246 total (1 active), Execution time: mean = 1.706 ms, total = 419.578 ms, Queueing time: mean = 65.096 us, max = 174.696 us, min = 11.175 us, total = 16.014 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 41 total (1 active, 1 running), Execution time: mean = 2.582 ms, total = 105.854 ms, Queueing time: mean = 58.200 us, max = 172.215 us, min = 16.335 us, total = 2.386 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.793 s, total = 2398.756 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 339.443 us, total = 1.697 ms, Queueing time: mean = 94.919 us, max = 184.802 us, min = 20.320 us, total = 474.594 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 4.766 us, total = 14.297 us, Queueing time: mean = 30.682 us, max = 57.450 us, min = 34.595 us, total = 92.045 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 05:54:16,209 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:54:16,324 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 220193 total (35 active) [state-dump] Queueing time: mean = 16.220 ms, max = 590.169 s, min = -0.000 s, total = 3571.428 s [state-dump] Execution time: mean = 11.059 ms, total = 2435.111 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 52899 total (0 active), Execution time: mean = 34.061 us, total = 1.802 s, Queueing time: mean = 102.576 us, max = 23.460 ms, min = 2.000 us, total = 5.426 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 52899 total (0 active), Execution time: mean = 487.872 us, total = 25.808 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 25179 total (1 active), Execution time: mean = 10.001 us, total = 251.813 ms, Queueing time: mean = 82.353 us, max = 6.066 ms, min = -0.000 s, total = 2.074 s [state-dump] NodeManager.CheckGC - 25179 total (1 active), Execution time: mean = 3.131 us, total = 78.825 ms, Queueing time: mean = 88.299 us, max = 6.066 ms, min = 3.126 us, total = 2.223 s [state-dump] ObjectManager.UpdateAvailableMemory - 25178 total (0 active), Execution time: mean = 5.377 us, total = 135.374 ms, Queueing time: mean = 96.450 us, max = 1.104 ms, min = 2.197 us, total = 2.428 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 12596 total (1 active), Execution time: mean = 16.910 us, total = 213.000 ms, Queueing time: mean = 68.483 us, max = 1.134 ms, min = 5.101 us, total = 862.611 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 10061 total (1 active), Execution time: mean = 438.447 us, total = 4.411 s, Queueing time: mean = 69.626 us, max = 13.366 ms, min = 3.140 us, total = 700.507 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2520 total (1 active), Execution time: mean = 8.266 us, total = 20.831 ms, Queueing time: mean = 171.508 us, max = 3.537 ms, min = 2.019 us, total = 432.200 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2520 total (1 active), Execution time: mean = 15.140 us, total = 38.153 ms, Queueing time: mean = 65.245 us, max = 2.658 ms, min = 7.553 us, total = 164.417 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2520 total (1 active), Execution time: mean = 3.229 us, total = 8.137 ms, Queueing time: mean = 174.728 us, max = 3.551 ms, min = 4.207 us, total = 440.314 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2519 total (0 active), Execution time: mean = 591.545 us, total = 1.490 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2519 total (0 active), Execution time: mean = 104.209 us, total = 262.502 ms, Queueing time: mean = 102.202 us, max = 901.407 us, min = 13.883 us, total = 257.447 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 841 total (1 active), Execution time: mean = 8.287 us, total = 6.969 ms, Queueing time: mean = 67.283 us, max = 496.804 us, min = 12.362 us, total = 56.585 ms [state-dump] NodeManager.deadline_timer.record_metrics - 504 total (1 active), Execution time: mean = 519.152 us, total = 261.652 ms, Queueing time: mean = 368.648 us, max = 1.908 ms, min = 8.783 us, total = 185.799 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 504 total (0 active), Execution time: mean = 1.404 ms, total = 707.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 504 total (1 active), Execution time: mean = 287.759 us, total = 145.031 ms, Queueing time: mean = 599.255 us, max = 2.199 ms, min = 5.323 us, total = 302.024 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 504 total (0 active), Execution time: mean = 49.957 us, total = 25.178 ms, Queueing time: mean = 99.352 us, max = 237.873 us, min = 6.906 us, total = 50.073 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 252 total (1 active), Execution time: mean = 1.708 ms, total = 430.517 ms, Queueing time: mean = 65.055 us, max = 174.696 us, min = 11.175 us, total = 16.394 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 42 total (1 active, 1 running), Execution time: mean = 2.589 ms, total = 108.743 ms, Queueing time: mean = 58.246 us, max = 172.215 us, min = 16.335 us, total = 2.446 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.793 s, total = 2398.756 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 339.443 us, total = 1.697 ms, Queueing time: mean = 94.919 us, max = 184.802 us, min = 20.320 us, total = 474.594 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 4.766 us, total = 14.297 us, Queueing time: mean = 30.682 us, max = 57.450 us, min = 34.595 us, total = 92.045 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:55:16,209 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:55:16,327 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 225424 total (35 active) [state-dump] Queueing time: mean = 15.845 ms, max = 590.169 s, min = -0.000 s, total = 3571.849 s [state-dump] Execution time: mean = 10.807 ms, total = 2436.054 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 54159 total (0 active), Execution time: mean = 34.151 us, total = 1.850 s, Queueing time: mean = 103.003 us, max = 23.460 ms, min = 2.000 us, total = 5.579 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 54159 total (0 active), Execution time: mean = 489.152 us, total = 26.492 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 25778 total (1 active), Execution time: mean = 9.976 us, total = 257.153 ms, Queueing time: mean = 82.494 us, max = 6.066 ms, min = -0.000 s, total = 2.127 s [state-dump] NodeManager.CheckGC - 25778 total (1 active), Execution time: mean = 3.125 us, total = 80.562 ms, Queueing time: mean = 88.421 us, max = 6.066 ms, min = 3.126 us, total = 2.279 s [state-dump] ObjectManager.UpdateAvailableMemory - 25777 total (0 active), Execution time: mean = 5.369 us, total = 138.394 ms, Queueing time: mean = 96.911 us, max = 1.104 ms, min = 2.197 us, total = 2.498 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 12896 total (1 active), Execution time: mean = 16.904 us, total = 217.999 ms, Queueing time: mean = 68.665 us, max = 1.377 ms, min = 5.101 us, total = 885.507 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 10300 total (1 active), Execution time: mean = 438.652 us, total = 4.518 s, Queueing time: mean = 69.686 us, max = 13.366 ms, min = 3.140 us, total = 717.768 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2580 total (1 active), Execution time: mean = 8.263 us, total = 21.320 ms, Queueing time: mean = 171.983 us, max = 3.537 ms, min = 2.019 us, total = 443.717 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2580 total (1 active), Execution time: mean = 15.185 us, total = 39.177 ms, Queueing time: mean = 65.353 us, max = 2.658 ms, min = 7.553 us, total = 168.610 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2580 total (1 active), Execution time: mean = 3.231 us, total = 8.336 ms, Queueing time: mean = 175.202 us, max = 3.551 ms, min = 4.207 us, total = 452.021 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2579 total (0 active), Execution time: mean = 592.346 us, total = 1.528 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2579 total (0 active), Execution time: mean = 104.131 us, total = 268.554 ms, Queueing time: mean = 102.378 us, max = 901.407 us, min = 13.551 us, total = 264.033 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 861 total (1 active), Execution time: mean = 8.283 us, total = 7.132 ms, Queueing time: mean = 67.263 us, max = 496.804 us, min = 12.362 us, total = 57.913 ms [state-dump] NodeManager.deadline_timer.record_metrics - 516 total (1 active), Execution time: mean = 520.003 us, total = 268.322 ms, Queueing time: mean = 370.004 us, max = 1.908 ms, min = 8.783 us, total = 190.922 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 516 total (0 active), Execution time: mean = 1.409 ms, total = 726.931 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 516 total (1 active), Execution time: mean = 289.255 us, total = 149.256 ms, Queueing time: mean = 600.079 us, max = 2.199 ms, min = 5.323 us, total = 309.641 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 516 total (0 active), Execution time: mean = 50.162 us, total = 25.883 ms, Queueing time: mean = 99.875 us, max = 237.873 us, min = 6.906 us, total = 51.536 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 258 total (1 active), Execution time: mean = 1.712 ms, total = 441.633 ms, Queueing time: mean = 65.134 us, max = 174.696 us, min = 11.175 us, total = 16.805 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 43 total (1 active, 1 running), Execution time: mean = 2.565 ms, total = 110.316 ms, Queueing time: mean = 58.874 us, max = 172.215 us, min = 16.335 us, total = 2.532 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.793 s, total = 2398.756 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 339.443 us, total = 1.697 ms, Queueing time: mean = 94.919 us, max = 184.802 us, min = 20.320 us, total = 474.594 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 4.766 us, total = 14.297 us, Queueing time: mean = 30.682 us, max = 57.450 us, min = 34.595 us, total = 92.045 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:56:16,210 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:56:16,330 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 230659 total (35 active) [state-dump] Queueing time: mean = 15.487 ms, max = 590.169 s, min = -0.000 s, total = 3572.217 s [state-dump] Execution time: mean = 10.565 ms, total = 2436.913 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 55419 total (0 active), Execution time: mean = 34.119 us, total = 1.891 s, Queueing time: mean = 102.934 us, max = 23.460 ms, min = 2.000 us, total = 5.705 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 55419 total (0 active), Execution time: mean = 489.121 us, total = 27.107 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 26378 total (1 active), Execution time: mean = 9.955 us, total = 262.586 ms, Queueing time: mean = 82.573 us, max = 6.066 ms, min = -0.000 s, total = 2.178 s [state-dump] NodeManager.CheckGC - 26378 total (1 active), Execution time: mean = 3.121 us, total = 82.337 ms, Queueing time: mean = 88.483 us, max = 6.066 ms, min = 3.126 us, total = 2.334 s [state-dump] ObjectManager.UpdateAvailableMemory - 26377 total (0 active), Execution time: mean = 5.357 us, total = 141.298 ms, Queueing time: mean = 96.792 us, max = 1.104 ms, min = 2.197 us, total = 2.553 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 13196 total (1 active), Execution time: mean = 16.889 us, total = 222.868 ms, Queueing time: mean = 68.849 us, max = 2.889 ms, min = 5.101 us, total = 908.528 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 10540 total (1 active), Execution time: mean = 438.452 us, total = 4.621 s, Queueing time: mean = 69.564 us, max = 13.366 ms, min = 3.140 us, total = 733.205 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2640 total (1 active), Execution time: mean = 8.758 us, total = 23.121 ms, Queueing time: mean = 171.427 us, max = 3.537 ms, min = -0.000 s, total = 452.568 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2640 total (1 active), Execution time: mean = 15.202 us, total = 40.134 ms, Queueing time: mean = 65.351 us, max = 2.658 ms, min = 7.553 us, total = 172.527 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2640 total (1 active), Execution time: mean = 3.236 us, total = 8.544 ms, Queueing time: mean = 175.130 us, max = 3.551 ms, min = 4.207 us, total = 462.342 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2639 total (0 active), Execution time: mean = 592.388 us, total = 1.563 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2639 total (0 active), Execution time: mean = 103.995 us, total = 274.442 ms, Queueing time: mean = 102.364 us, max = 901.407 us, min = 13.334 us, total = 270.139 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 881 total (1 active), Execution time: mean = 8.285 us, total = 7.299 ms, Queueing time: mean = 67.306 us, max = 496.804 us, min = 12.362 us, total = 59.296 ms [state-dump] NodeManager.deadline_timer.record_metrics - 528 total (1 active), Execution time: mean = 519.170 us, total = 274.122 ms, Queueing time: mean = 369.269 us, max = 1.908 ms, min = 8.783 us, total = 194.974 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 528 total (0 active), Execution time: mean = 1.411 ms, total = 744.910 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 528 total (1 active), Execution time: mean = 289.529 us, total = 152.871 ms, Queueing time: mean = 598.220 us, max = 2.199 ms, min = 5.323 us, total = 315.860 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 528 total (0 active), Execution time: mean = 50.206 us, total = 26.509 ms, Queueing time: mean = 99.821 us, max = 237.873 us, min = 6.906 us, total = 52.705 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 264 total (1 active), Execution time: mean = 1.708 ms, total = 451.039 ms, Queueing time: mean = 64.766 us, max = 174.696 us, min = 11.175 us, total = 17.098 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 44 total (1 active, 1 running), Execution time: mean = 2.568 ms, total = 112.990 ms, Queueing time: mean = 58.982 us, max = 172.215 us, min = 16.335 us, total = 2.595 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.793 s, total = 2398.756 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 339.443 us, total = 1.697 ms, Queueing time: mean = 94.919 us, max = 184.802 us, min = 20.320 us, total = 474.594 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 4.766 us, total = 14.297 us, Queueing time: mean = 30.682 us, max = 57.450 us, min = 34.595 us, total = 92.045 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:57:16,210 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:57:16,332 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 235891 total (35 active) [state-dump] Queueing time: mean = 15.145 ms, max = 590.169 s, min = -0.000 s, total = 3572.526 s [state-dump] Execution time: mean = 10.334 ms, total = 2437.648 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 56679 total (0 active), Execution time: mean = 34.007 us, total = 1.927 s, Queueing time: mean = 102.412 us, max = 23.460 ms, min = 2.000 us, total = 5.805 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 56679 total (0 active), Execution time: mean = 487.105 us, total = 27.609 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 26977 total (1 active), Execution time: mean = 9.928 us, total = 267.817 ms, Queueing time: mean = 82.308 us, max = 6.066 ms, min = -0.000 s, total = 2.220 s [state-dump] NodeManager.CheckGC - 26977 total (1 active), Execution time: mean = 3.118 us, total = 84.118 ms, Queueing time: mean = 88.197 us, max = 6.066 ms, min = 3.126 us, total = 2.379 s [state-dump] ObjectManager.UpdateAvailableMemory - 26976 total (0 active), Execution time: mean = 5.342 us, total = 144.097 ms, Queueing time: mean = 96.335 us, max = 1.104 ms, min = 2.197 us, total = 2.599 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 13496 total (1 active), Execution time: mean = 16.854 us, total = 227.465 ms, Queueing time: mean = 68.616 us, max = 2.889 ms, min = 5.101 us, total = 926.045 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 10779 total (1 active), Execution time: mean = 438.244 us, total = 4.724 s, Queueing time: mean = 69.336 us, max = 13.366 ms, min = 3.140 us, total = 747.370 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2700 total (1 active), Execution time: mean = 8.740 us, total = 23.597 ms, Queueing time: mean = 171.309 us, max = 3.537 ms, min = -0.000 s, total = 462.535 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2700 total (1 active), Execution time: mean = 15.206 us, total = 41.056 ms, Queueing time: mean = 65.029 us, max = 2.658 ms, min = 7.553 us, total = 175.580 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2700 total (1 active), Execution time: mean = 3.238 us, total = 8.742 ms, Queueing time: mean = 174.994 us, max = 3.551 ms, min = 4.207 us, total = 472.484 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2699 total (0 active), Execution time: mean = 590.541 us, total = 1.594 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2699 total (0 active), Execution time: mean = 103.842 us, total = 280.270 ms, Queueing time: mean = 101.969 us, max = 901.407 us, min = 13.334 us, total = 275.214 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 901 total (1 active), Execution time: mean = 8.269 us, total = 7.450 ms, Queueing time: mean = 67.121 us, max = 496.804 us, min = 12.362 us, total = 60.476 ms [state-dump] NodeManager.deadline_timer.record_metrics - 540 total (1 active), Execution time: mean = 518.830 us, total = 280.168 ms, Queueing time: mean = 370.178 us, max = 1.908 ms, min = 8.783 us, total = 199.896 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 540 total (0 active), Execution time: mean = 1.411 ms, total = 762.128 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 540 total (1 active), Execution time: mean = 290.028 us, total = 156.615 ms, Queueing time: mean = 598.353 us, max = 2.199 ms, min = 5.323 us, total = 323.111 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 540 total (0 active), Execution time: mean = 50.169 us, total = 27.091 ms, Queueing time: mean = 99.594 us, max = 237.873 us, min = 6.906 us, total = 53.781 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 270 total (1 active), Execution time: mean = 1.711 ms, total = 461.874 ms, Queueing time: mean = 64.582 us, max = 174.696 us, min = 11.175 us, total = 17.437 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 45 total (1 active, 1 running), Execution time: mean = 2.571 ms, total = 115.704 ms, Queueing time: mean = 59.210 us, max = 172.215 us, min = 16.335 us, total = 2.664 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.793 s, total = 2398.756 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 339.443 us, total = 1.697 ms, Queueing time: mean = 94.919 us, max = 184.802 us, min = 20.320 us, total = 474.594 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 5.370 us, total = 21.480 us, Queueing time: mean = 37.229 us, max = 57.450 us, min = 34.595 us, total = 148.915 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 05:58:16,210 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:58:16,335 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 241125 total (35 active) [state-dump] Queueing time: mean = 14.818 ms, max = 590.169 s, min = -0.000 s, total = 3572.898 s [state-dump] Execution time: mean = 10.113 ms, total = 2438.536 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 57939 total (0 active), Execution time: mean = 34.056 us, total = 1.973 s, Queueing time: mean = 102.518 us, max = 23.460 ms, min = 2.000 us, total = 5.940 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 57939 total (0 active), Execution time: mean = 487.494 us, total = 28.245 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 27577 total (1 active), Execution time: mean = 9.914 us, total = 273.387 ms, Queueing time: mean = 82.284 us, max = 6.066 ms, min = -0.000 s, total = 2.269 s [state-dump] NodeManager.CheckGC - 27577 total (1 active), Execution time: mean = 3.116 us, total = 85.916 ms, Queueing time: mean = 88.161 us, max = 6.066 ms, min = 3.126 us, total = 2.431 s [state-dump] ObjectManager.UpdateAvailableMemory - 27576 total (0 active), Execution time: mean = 5.337 us, total = 147.184 ms, Queueing time: mean = 96.272 us, max = 1.104 ms, min = 2.197 us, total = 2.655 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 13795 total (1 active), Execution time: mean = 16.836 us, total = 232.258 ms, Queueing time: mean = 68.577 us, max = 2.889 ms, min = 5.101 us, total = 946.020 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 11019 total (1 active), Execution time: mean = 438.297 us, total = 4.830 s, Queueing time: mean = 69.241 us, max = 13.366 ms, min = 3.140 us, total = 762.969 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2760 total (1 active), Execution time: mean = 8.723 us, total = 24.075 ms, Queueing time: mean = 171.281 us, max = 3.537 ms, min = -0.000 s, total = 472.737 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2760 total (1 active), Execution time: mean = 15.203 us, total = 41.962 ms, Queueing time: mean = 65.069 us, max = 2.658 ms, min = 7.553 us, total = 179.590 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2760 total (1 active), Execution time: mean = 3.236 us, total = 8.931 ms, Queueing time: mean = 174.952 us, max = 3.551 ms, min = 4.207 us, total = 482.869 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2759 total (0 active), Execution time: mean = 590.470 us, total = 1.629 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2759 total (0 active), Execution time: mean = 103.738 us, total = 286.213 ms, Queueing time: mean = 102.026 us, max = 901.407 us, min = 13.334 us, total = 281.490 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 921 total (1 active), Execution time: mean = 8.257 us, total = 7.604 ms, Queueing time: mean = 66.929 us, max = 496.804 us, min = 12.362 us, total = 61.641 ms [state-dump] NodeManager.deadline_timer.record_metrics - 552 total (1 active), Execution time: mean = 518.906 us, total = 286.436 ms, Queueing time: mean = 370.318 us, max = 1.908 ms, min = 8.783 us, total = 204.415 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 552 total (0 active), Execution time: mean = 1.414 ms, total = 780.552 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 552 total (1 active), Execution time: mean = 290.439 us, total = 160.322 ms, Queueing time: mean = 598.085 us, max = 2.199 ms, min = 5.323 us, total = 330.143 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 552 total (0 active), Execution time: mean = 50.166 us, total = 27.692 ms, Queueing time: mean = 99.826 us, max = 237.873 us, min = 6.906 us, total = 55.104 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 276 total (1 active), Execution time: mean = 1.712 ms, total = 472.441 ms, Queueing time: mean = 64.423 us, max = 174.696 us, min = 11.175 us, total = 17.781 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 46 total (1 active, 1 running), Execution time: mean = 2.571 ms, total = 118.285 ms, Queueing time: mean = 59.246 us, max = 172.215 us, min = 16.335 us, total = 2.725 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.793 s, total = 2398.756 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 339.443 us, total = 1.697 ms, Queueing time: mean = 94.919 us, max = 184.802 us, min = 20.320 us, total = 474.594 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 5.370 us, total = 21.480 us, Queueing time: mean = 37.229 us, max = 57.450 us, min = 34.595 us, total = 148.915 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 05:59:16,211 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 05:59:16,337 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 246356 total (35 active) [state-dump] Queueing time: mean = 14.505 ms, max = 590.169 s, min = -0.000 s, total = 3573.279 s [state-dump] Execution time: mean = 9.902 ms, total = 2439.360 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 59199 total (0 active), Execution time: mean = 34.027 us, total = 2.014 s, Queueing time: mean = 102.337 us, max = 23.460 ms, min = 2.000 us, total = 6.058 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 59199 total (0 active), Execution time: mean = 486.896 us, total = 28.824 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 28176 total (1 active), Execution time: mean = 9.928 us, total = 279.735 ms, Queueing time: mean = 82.460 us, max = 6.066 ms, min = -0.000 s, total = 2.323 s [state-dump] NodeManager.CheckGC - 28176 total (1 active), Execution time: mean = 3.118 us, total = 87.852 ms, Queueing time: mean = 88.349 us, max = 6.066 ms, min = 3.126 us, total = 2.489 s [state-dump] ObjectManager.UpdateAvailableMemory - 28175 total (0 active), Execution time: mean = 5.338 us, total = 150.409 ms, Queueing time: mean = 96.185 us, max = 1.104 ms, min = 2.197 us, total = 2.710 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 14095 total (1 active), Execution time: mean = 16.832 us, total = 237.247 ms, Queueing time: mean = 68.814 us, max = 2.889 ms, min = 5.101 us, total = 969.934 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 11258 total (1 active), Execution time: mean = 438.275 us, total = 4.934 s, Queueing time: mean = 69.172 us, max = 13.366 ms, min = 3.140 us, total = 778.737 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2820 total (1 active), Execution time: mean = 8.730 us, total = 24.618 ms, Queueing time: mean = 172.043 us, max = 3.537 ms, min = -0.000 s, total = 485.160 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2820 total (1 active), Execution time: mean = 15.235 us, total = 42.961 ms, Queueing time: mean = 65.160 us, max = 2.658 ms, min = 7.553 us, total = 183.751 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2820 total (1 active), Execution time: mean = 3.236 us, total = 9.126 ms, Queueing time: mean = 175.716 us, max = 3.551 ms, min = 4.207 us, total = 495.519 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2819 total (0 active), Execution time: mean = 590.088 us, total = 1.663 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2819 total (0 active), Execution time: mean = 103.541 us, total = 291.883 ms, Queueing time: mean = 102.008 us, max = 901.407 us, min = 11.691 us, total = 287.561 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 941 total (1 active), Execution time: mean = 8.275 us, total = 7.786 ms, Queueing time: mean = 66.987 us, max = 496.804 us, min = 12.362 us, total = 63.035 ms [state-dump] NodeManager.deadline_timer.record_metrics - 564 total (1 active), Execution time: mean = 519.666 us, total = 293.091 ms, Queueing time: mean = 372.849 us, max = 2.197 ms, min = 8.783 us, total = 210.287 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 564 total (0 active), Execution time: mean = 1.415 ms, total = 797.952 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 564 total (1 active), Execution time: mean = 290.869 us, total = 164.050 ms, Queueing time: mean = 601.042 us, max = 2.307 ms, min = 5.323 us, total = 338.988 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 564 total (0 active), Execution time: mean = 50.269 us, total = 28.351 ms, Queueing time: mean = 99.887 us, max = 237.873 us, min = 6.906 us, total = 56.336 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 282 total (1 active), Execution time: mean = 1.711 ms, total = 482.608 ms, Queueing time: mean = 69.987 us, max = 1.632 ms, min = 11.175 us, total = 19.736 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 47 total (1 active, 1 running), Execution time: mean = 2.578 ms, total = 121.155 ms, Queueing time: mean = 58.764 us, max = 172.215 us, min = 16.335 us, total = 2.762 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.793 s, total = 2398.756 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 339.443 us, total = 1.697 ms, Queueing time: mean = 94.919 us, max = 184.802 us, min = 20.320 us, total = 474.594 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 5.370 us, total = 21.480 us, Queueing time: mean = 37.229 us, max = 57.450 us, min = 34.595 us, total = 148.915 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:00:16,211 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:00:16,340 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 251591 total (35 active) [state-dump] Queueing time: mean = 14.204 ms, max = 590.169 s, min = -0.000 s, total = 3573.531 s [state-dump] Execution time: mean = 9.698 ms, total = 2440.004 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 60459 total (0 active), Execution time: mean = 33.845 us, total = 2.046 s, Queueing time: mean = 101.487 us, max = 23.460 ms, min = 2.000 us, total = 6.136 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 60459 total (0 active), Execution time: mean = 483.820 us, total = 29.251 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 28776 total (1 active), Execution time: mean = 9.887 us, total = 284.500 ms, Queueing time: mean = 81.998 us, max = 6.066 ms, min = -0.000 s, total = 2.360 s [state-dump] NodeManager.CheckGC - 28776 total (1 active), Execution time: mean = 3.113 us, total = 89.571 ms, Queueing time: mean = 87.853 us, max = 6.066 ms, min = 3.126 us, total = 2.528 s [state-dump] ObjectManager.UpdateAvailableMemory - 28775 total (0 active), Execution time: mean = 5.312 us, total = 152.861 ms, Queueing time: mean = 95.438 us, max = 1.104 ms, min = 2.197 us, total = 2.746 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 14395 total (1 active), Execution time: mean = 16.775 us, total = 241.475 ms, Queueing time: mean = 68.429 us, max = 2.889 ms, min = 5.101 us, total = 985.030 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 11498 total (1 active), Execution time: mean = 437.954 us, total = 5.036 s, Queueing time: mean = 68.767 us, max = 13.366 ms, min = 93.000 ns, total = 790.684 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2880 total (1 active), Execution time: mean = 8.696 us, total = 25.045 ms, Queueing time: mean = 171.614 us, max = 3.537 ms, min = -0.000 s, total = 494.248 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2880 total (1 active), Execution time: mean = 15.184 us, total = 43.730 ms, Queueing time: mean = 64.760 us, max = 2.658 ms, min = 7.553 us, total = 186.510 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2880 total (1 active), Execution time: mean = 3.230 us, total = 9.302 ms, Queueing time: mean = 175.267 us, max = 3.551 ms, min = 4.207 us, total = 504.768 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2879 total (0 active), Execution time: mean = 587.050 us, total = 1.690 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2879 total (0 active), Execution time: mean = 103.200 us, total = 297.113 ms, Queueing time: mean = 101.297 us, max = 901.407 us, min = 6.667 us, total = 291.634 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 961 total (1 active), Execution time: mean = 8.262 us, total = 7.939 ms, Queueing time: mean = 66.814 us, max = 496.804 us, min = 12.362 us, total = 64.208 ms [state-dump] NodeManager.deadline_timer.record_metrics - 576 total (1 active), Execution time: mean = 519.959 us, total = 299.496 ms, Queueing time: mean = 370.710 us, max = 2.197 ms, min = 8.783 us, total = 213.529 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 576 total (0 active), Execution time: mean = 1.411 ms, total = 812.637 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 576 total (1 active), Execution time: mean = 291.163 us, total = 167.710 ms, Queueing time: mean = 598.813 us, max = 2.307 ms, min = 5.323 us, total = 344.916 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 576 total (0 active), Execution time: mean = 50.079 us, total = 28.845 ms, Queueing time: mean = 99.134 us, max = 237.873 us, min = 6.906 us, total = 57.101 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 288 total (1 active), Execution time: mean = 1.710 ms, total = 492.448 ms, Queueing time: mean = 69.336 us, max = 1.632 ms, min = 11.175 us, total = 19.969 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 48 total (1 active, 1 running), Execution time: mean = 2.561 ms, total = 122.945 ms, Queueing time: mean = 58.874 us, max = 172.215 us, min = 16.335 us, total = 2.826 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.793 s, total = 2398.756 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 339.443 us, total = 1.697 ms, Queueing time: mean = 94.919 us, max = 184.802 us, min = 20.320 us, total = 474.594 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 5.370 us, total = 21.480 us, Queueing time: mean = 37.229 us, max = 57.450 us, min = 34.595 us, total = 148.915 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:01:16,211 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:01:16,342 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 256822 total (35 active) [state-dump] Queueing time: mean = 13.916 ms, max = 590.169 s, min = -0.000 s, total = 3573.944 s [state-dump] Execution time: mean = 9.504 ms, total = 2440.949 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 61719 total (0 active), Execution time: mean = 33.902 us, total = 2.092 s, Queueing time: mean = 101.761 us, max = 23.460 ms, min = 2.000 us, total = 6.281 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 61719 total (0 active), Execution time: mean = 485.033 us, total = 29.936 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 29375 total (1 active), Execution time: mean = 9.898 us, total = 290.740 ms, Queueing time: mean = 82.223 us, max = 6.066 ms, min = -0.000 s, total = 2.415 s [state-dump] NodeManager.CheckGC - 29375 total (1 active), Execution time: mean = 3.116 us, total = 91.543 ms, Queueing time: mean = 88.084 us, max = 6.066 ms, min = 3.126 us, total = 2.587 s [state-dump] ObjectManager.UpdateAvailableMemory - 29374 total (0 active), Execution time: mean = 5.324 us, total = 156.392 ms, Queueing time: mean = 95.670 us, max = 1.104 ms, min = 2.197 us, total = 2.810 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 14695 total (1 active), Execution time: mean = 16.825 us, total = 247.243 ms, Queueing time: mean = 68.552 us, max = 2.889 ms, min = 5.101 us, total = 1.007 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 11737 total (1 active), Execution time: mean = 438.090 us, total = 5.142 s, Queueing time: mean = 68.862 us, max = 13.366 ms, min = 93.000 ns, total = 808.238 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2940 total (1 active), Execution time: mean = 8.713 us, total = 25.615 ms, Queueing time: mean = 171.942 us, max = 3.537 ms, min = -0.000 s, total = 505.510 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2940 total (1 active), Execution time: mean = 15.223 us, total = 44.756 ms, Queueing time: mean = 64.944 us, max = 2.658 ms, min = 7.553 us, total = 190.936 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2940 total (1 active), Execution time: mean = 3.231 us, total = 9.500 ms, Queueing time: mean = 175.602 us, max = 3.551 ms, min = 4.207 us, total = 516.270 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2939 total (0 active), Execution time: mean = 588.157 us, total = 1.729 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2939 total (0 active), Execution time: mean = 103.211 us, total = 303.337 ms, Queueing time: mean = 101.490 us, max = 901.407 us, min = 6.667 us, total = 298.279 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 981 total (1 active), Execution time: mean = 8.268 us, total = 8.110 ms, Queueing time: mean = 66.815 us, max = 496.804 us, min = 12.362 us, total = 65.545 ms [state-dump] NodeManager.deadline_timer.record_metrics - 588 total (1 active), Execution time: mean = 520.815 us, total = 306.239 ms, Queueing time: mean = 371.181 us, max = 2.197 ms, min = 8.783 us, total = 218.254 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 588 total (0 active), Execution time: mean = 1.413 ms, total = 830.726 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 588 total (1 active), Execution time: mean = 292.272 us, total = 171.856 ms, Queueing time: mean = 599.126 us, max = 2.307 ms, min = 5.323 us, total = 352.286 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 588 total (0 active), Execution time: mean = 50.201 us, total = 29.518 ms, Queueing time: mean = 99.413 us, max = 237.873 us, min = 6.906 us, total = 58.455 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 294 total (1 active), Execution time: mean = 1.712 ms, total = 503.190 ms, Queueing time: mean = 69.706 us, max = 1.632 ms, min = 11.175 us, total = 20.494 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 49 total (1 active, 1 running), Execution time: mean = 2.566 ms, total = 125.716 ms, Queueing time: mean = 59.812 us, max = 172.215 us, min = 16.335 us, total = 2.931 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.793 s, total = 2398.756 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 339.443 us, total = 1.697 ms, Queueing time: mean = 94.919 us, max = 184.802 us, min = 20.320 us, total = 474.594 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 5.370 us, total = 21.480 us, Queueing time: mean = 37.229 us, max = 57.450 us, min = 34.595 us, total = 148.915 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:02:16,211 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:02:16,344 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 262059 total (35 active) [state-dump] Queueing time: mean = 13.639 ms, max = 590.169 s, min = -0.000 s, total = 3574.202 s [state-dump] Execution time: mean = 11.607 ms, total = 3041.599 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 62979 total (0 active), Execution time: mean = 33.742 us, total = 2.125 s, Queueing time: mean = 101.080 us, max = 23.460 ms, min = 2.000 us, total = 6.366 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 62979 total (0 active), Execution time: mean = 482.317 us, total = 30.376 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 29975 total (1 active), Execution time: mean = 9.861 us, total = 295.574 ms, Queueing time: mean = 81.821 us, max = 6.066 ms, min = -0.000 s, total = 2.453 s [state-dump] NodeManager.CheckGC - 29975 total (1 active), Execution time: mean = 3.110 us, total = 93.212 ms, Queueing time: mean = 87.655 us, max = 6.066 ms, min = 3.126 us, total = 2.627 s [state-dump] ObjectManager.UpdateAvailableMemory - 29974 total (0 active), Execution time: mean = 5.294 us, total = 158.680 ms, Queueing time: mean = 94.797 us, max = 1.104 ms, min = 2.197 us, total = 2.841 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 14995 total (1 active), Execution time: mean = 16.752 us, total = 251.192 ms, Queueing time: mean = 68.146 us, max = 2.889 ms, min = 5.101 us, total = 1.022 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 11977 total (1 active), Execution time: mean = 437.509 us, total = 5.240 s, Queueing time: mean = 68.404 us, max = 13.366 ms, min = 93.000 ns, total = 819.274 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3000 total (1 active), Execution time: mean = 8.690 us, total = 26.070 ms, Queueing time: mean = 171.685 us, max = 3.537 ms, min = -0.000 s, total = 515.056 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3000 total (1 active), Execution time: mean = 15.153 us, total = 45.458 ms, Queueing time: mean = 64.561 us, max = 2.658 ms, min = 7.553 us, total = 193.682 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3000 total (1 active), Execution time: mean = 3.230 us, total = 9.689 ms, Queueing time: mean = 175.327 us, max = 3.551 ms, min = 4.207 us, total = 525.982 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2999 total (0 active), Execution time: mean = 584.562 us, total = 1.753 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2999 total (0 active), Execution time: mean = 102.852 us, total = 308.452 ms, Queueing time: mean = 100.524 us, max = 901.407 us, min = 6.667 us, total = 301.471 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1001 total (1 active), Execution time: mean = 8.247 us, total = 8.255 ms, Queueing time: mean = 66.337 us, max = 496.804 us, min = 12.362 us, total = 66.404 ms [state-dump] NodeManager.deadline_timer.record_metrics - 600 total (1 active), Execution time: mean = 520.684 us, total = 312.411 ms, Queueing time: mean = 370.585 us, max = 2.197 ms, min = 8.783 us, total = 222.351 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 600 total (0 active), Execution time: mean = 1.409 ms, total = 845.373 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 600 total (1 active), Execution time: mean = 292.190 us, total = 175.314 ms, Queueing time: mean = 598.355 us, max = 2.307 ms, min = 5.323 us, total = 359.013 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 600 total (0 active), Execution time: mean = 50.090 us, total = 30.054 ms, Queueing time: mean = 98.664 us, max = 237.873 us, min = 6.906 us, total = 59.199 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 300 total (1 active), Execution time: mean = 1.710 ms, total = 513.121 ms, Queueing time: mean = 69.633 us, max = 1.632 ms, min = 11.175 us, total = 20.890 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 50 total (1 active, 1 running), Execution time: mean = 2.559 ms, total = 127.969 ms, Queueing time: mean = 59.650 us, max = 172.215 us, min = 16.335 us, total = 2.982 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.394 s, total = 2998.755 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 330.012 us, total = 1.980 ms, Queueing time: mean = 84.385 us, max = 184.802 us, min = 20.320 us, total = 506.307 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 5.370 us, total = 21.480 us, Queueing time: mean = 37.229 us, max = 57.450 us, min = 34.595 us, total = 148.915 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:03:16,211 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:03:16,347 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 267291 total (35 active) [state-dump] Queueing time: mean = 13.373 ms, max = 590.169 s, min = -0.000 s, total = 3574.518 s [state-dump] Execution time: mean = 11.382 ms, total = 3042.319 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 64239 total (0 active), Execution time: mean = 33.580 us, total = 2.157 s, Queueing time: mean = 100.664 us, max = 23.460 ms, min = 2.000 us, total = 6.467 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 64239 total (0 active), Execution time: mean = 480.668 us, total = 30.878 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 30574 total (1 active), Execution time: mean = 9.844 us, total = 300.973 ms, Queueing time: mean = 81.900 us, max = 6.066 ms, min = -0.000 s, total = 2.504 s [state-dump] NodeManager.CheckGC - 30574 total (1 active), Execution time: mean = 3.109 us, total = 95.042 ms, Queueing time: mean = 87.722 us, max = 6.066 ms, min = 3.126 us, total = 2.682 s [state-dump] ObjectManager.UpdateAvailableMemory - 30573 total (0 active), Execution time: mean = 5.276 us, total = 161.310 ms, Queueing time: mean = 94.226 us, max = 1.104 ms, min = 2.197 us, total = 2.881 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 15295 total (1 active), Execution time: mean = 16.716 us, total = 255.665 ms, Queueing time: mean = 67.944 us, max = 2.889 ms, min = 5.101 us, total = 1.039 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 12217 total (1 active), Execution time: mean = 437.270 us, total = 5.342 s, Queueing time: mean = 68.164 us, max = 13.366 ms, min = 93.000 ns, total = 832.758 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3060 total (1 active), Execution time: mean = 8.668 us, total = 26.525 ms, Queueing time: mean = 171.556 us, max = 3.537 ms, min = -0.000 s, total = 524.961 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3060 total (1 active), Execution time: mean = 15.132 us, total = 46.304 ms, Queueing time: mean = 64.313 us, max = 2.658 ms, min = 7.553 us, total = 196.799 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3060 total (1 active), Execution time: mean = 3.225 us, total = 9.869 ms, Queueing time: mean = 175.187 us, max = 3.551 ms, min = 4.207 us, total = 536.073 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3059 total (0 active), Execution time: mean = 582.268 us, total = 1.781 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3059 total (0 active), Execution time: mean = 102.538 us, total = 313.662 ms, Queueing time: mean = 99.852 us, max = 901.407 us, min = 6.667 us, total = 305.449 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1021 total (1 active), Execution time: mean = 8.230 us, total = 8.402 ms, Queueing time: mean = 66.027 us, max = 496.804 us, min = 12.362 us, total = 67.414 ms [state-dump] NodeManager.deadline_timer.record_metrics - 612 total (1 active), Execution time: mean = 519.895 us, total = 318.176 ms, Queueing time: mean = 370.080 us, max = 2.197 ms, min = 8.783 us, total = 226.489 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 612 total (0 active), Execution time: mean = 1.404 ms, total = 859.189 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 612 total (1 active), Execution time: mean = 292.105 us, total = 178.768 ms, Queueing time: mean = 597.316 us, max = 2.307 ms, min = 5.323 us, total = 365.558 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 612 total (0 active), Execution time: mean = 49.941 us, total = 30.564 ms, Queueing time: mean = 97.797 us, max = 237.873 us, min = 6.906 us, total = 59.852 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 306 total (1 active), Execution time: mean = 1.709 ms, total = 522.856 ms, Queueing time: mean = 69.429 us, max = 1.632 ms, min = 11.175 us, total = 21.245 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 51 total (1 active, 1 running), Execution time: mean = 2.541 ms, total = 129.599 ms, Queueing time: mean = 58.852 us, max = 172.215 us, min = 16.335 us, total = 3.001 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.394 s, total = 2998.755 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 330.012 us, total = 1.980 ms, Queueing time: mean = 84.385 us, max = 184.802 us, min = 20.320 us, total = 506.307 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 5.370 us, total = 21.480 us, Queueing time: mean = 37.229 us, max = 57.450 us, min = 34.595 us, total = 148.915 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:04:16,212 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:04:16,349 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 272525 total (35 active) [state-dump] Queueing time: mean = 13.118 ms, max = 590.169 s, min = -0.000 s, total = 3574.906 s [state-dump] Execution time: mean = 11.166 ms, total = 3043.138 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 65499 total (0 active), Execution time: mean = 33.532 us, total = 2.196 s, Queueing time: mean = 100.654 us, max = 23.460 ms, min = 2.000 us, total = 6.593 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 65499 total (0 active), Execution time: mean = 480.281 us, total = 31.458 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 31174 total (1 active), Execution time: mean = 9.843 us, total = 306.843 ms, Queueing time: mean = 82.167 us, max = 6.066 ms, min = -0.000 s, total = 2.561 s [state-dump] NodeManager.CheckGC - 31174 total (1 active), Execution time: mean = 3.110 us, total = 96.963 ms, Queueing time: mean = 87.985 us, max = 6.066 ms, min = 3.126 us, total = 2.743 s [state-dump] ObjectManager.UpdateAvailableMemory - 31173 total (0 active), Execution time: mean = 5.270 us, total = 164.271 ms, Queueing time: mean = 94.311 us, max = 1.104 ms, min = 2.197 us, total = 2.940 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 15595 total (1 active), Execution time: mean = 16.751 us, total = 261.227 ms, Queueing time: mean = 68.027 us, max = 2.889 ms, min = 5.101 us, total = 1.061 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 12456 total (1 active), Execution time: mean = 437.122 us, total = 5.445 s, Queueing time: mean = 68.169 us, max = 13.366 ms, min = 93.000 ns, total = 849.110 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3120 total (1 active), Execution time: mean = 8.674 us, total = 27.064 ms, Queueing time: mean = 171.647 us, max = 3.537 ms, min = -0.000 s, total = 535.537 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3120 total (1 active), Execution time: mean = 15.143 us, total = 47.247 ms, Queueing time: mean = 64.443 us, max = 2.658 ms, min = 7.553 us, total = 201.061 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3120 total (1 active), Execution time: mean = 3.221 us, total = 10.051 ms, Queueing time: mean = 175.279 us, max = 3.551 ms, min = 4.207 us, total = 546.871 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3119 total (0 active), Execution time: mean = 581.338 us, total = 1.813 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3119 total (0 active), Execution time: mean = 102.345 us, total = 319.214 ms, Queueing time: mean = 99.719 us, max = 901.407 us, min = 6.667 us, total = 311.024 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1041 total (1 active), Execution time: mean = 8.214 us, total = 8.551 ms, Queueing time: mean = 65.905 us, max = 496.804 us, min = 12.362 us, total = 68.607 ms [state-dump] NodeManager.deadline_timer.record_metrics - 624 total (1 active), Execution time: mean = 520.361 us, total = 324.705 ms, Queueing time: mean = 370.037 us, max = 2.197 ms, min = 8.783 us, total = 230.903 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 624 total (0 active), Execution time: mean = 1.405 ms, total = 876.468 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 624 total (1 active), Execution time: mean = 292.189 us, total = 182.326 ms, Queueing time: mean = 597.700 us, max = 2.307 ms, min = 5.323 us, total = 372.965 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 624 total (0 active), Execution time: mean = 50.048 us, total = 31.230 ms, Queueing time: mean = 97.718 us, max = 237.873 us, min = 6.906 us, total = 60.976 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 312 total (1 active), Execution time: mean = 1.708 ms, total = 532.943 ms, Queueing time: mean = 69.530 us, max = 1.632 ms, min = 11.175 us, total = 21.693 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 52 total (1 active, 1 running), Execution time: mean = 2.548 ms, total = 132.504 ms, Queueing time: mean = 59.525 us, max = 172.215 us, min = 16.335 us, total = 3.095 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.394 s, total = 2998.755 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 330.012 us, total = 1.980 ms, Queueing time: mean = 84.385 us, max = 184.802 us, min = 20.320 us, total = 506.307 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 5.370 us, total = 21.480 us, Queueing time: mean = 37.229 us, max = 57.450 us, min = 34.595 us, total = 148.915 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:05:16,212 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:05:16,352 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 277757 total (35 active) [state-dump] Queueing time: mean = 12.872 ms, max = 590.169 s, min = -0.000 s, total = 3575.305 s [state-dump] Execution time: mean = 10.959 ms, total = 3044.027 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 66759 total (0 active), Execution time: mean = 33.517 us, total = 2.238 s, Queueing time: mean = 100.677 us, max = 23.460 ms, min = 2.000 us, total = 6.721 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 66759 total (0 active), Execution time: mean = 480.719 us, total = 32.092 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 31773 total (1 active), Execution time: mean = 9.865 us, total = 313.453 ms, Queueing time: mean = 82.361 us, max = 6.066 ms, min = -0.000 s, total = 2.617 s [state-dump] NodeManager.CheckGC - 31773 total (1 active), Execution time: mean = 3.115 us, total = 98.958 ms, Queueing time: mean = 88.197 us, max = 6.066 ms, min = 3.126 us, total = 2.802 s [state-dump] ObjectManager.UpdateAvailableMemory - 31772 total (0 active), Execution time: mean = 5.279 us, total = 167.730 ms, Queueing time: mean = 94.599 us, max = 1.104 ms, min = 2.197 us, total = 3.006 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 15895 total (1 active), Execution time: mean = 16.808 us, total = 267.170 ms, Queueing time: mean = 68.145 us, max = 2.889 ms, min = -0.000 s, total = 1.083 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 12696 total (1 active), Execution time: mean = 437.405 us, total = 5.553 s, Queueing time: mean = 68.324 us, max = 13.366 ms, min = 93.000 ns, total = 867.437 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3180 total (1 active), Execution time: mean = 8.683 us, total = 27.612 ms, Queueing time: mean = 171.962 us, max = 3.537 ms, min = -0.000 s, total = 546.840 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3180 total (1 active), Execution time: mean = 15.196 us, total = 48.324 ms, Queueing time: mean = 64.609 us, max = 2.658 ms, min = 7.553 us, total = 205.458 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3180 total (1 active), Execution time: mean = 3.219 us, total = 10.238 ms, Queueing time: mean = 175.601 us, max = 3.551 ms, min = 4.207 us, total = 558.412 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3179 total (0 active), Execution time: mean = 581.603 us, total = 1.849 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3179 total (0 active), Execution time: mean = 102.169 us, total = 324.796 ms, Queueing time: mean = 99.852 us, max = 901.407 us, min = 6.667 us, total = 317.429 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1061 total (1 active), Execution time: mean = 8.235 us, total = 8.737 ms, Queueing time: mean = 66.153 us, max = 496.804 us, min = 12.362 us, total = 70.188 ms [state-dump] NodeManager.deadline_timer.record_metrics - 636 total (1 active), Execution time: mean = 521.321 us, total = 331.560 ms, Queueing time: mean = 370.767 us, max = 2.197 ms, min = 8.783 us, total = 235.808 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 636 total (0 active), Execution time: mean = 1.407 ms, total = 894.844 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 636 total (1 active), Execution time: mean = 292.520 us, total = 186.043 ms, Queueing time: mean = 599.096 us, max = 2.307 ms, min = 5.323 us, total = 381.025 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 636 total (0 active), Execution time: mean = 50.209 us, total = 31.933 ms, Queueing time: mean = 98.137 us, max = 237.873 us, min = 6.906 us, total = 62.415 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 318 total (1 active), Execution time: mean = 1.712 ms, total = 544.446 ms, Queueing time: mean = 69.470 us, max = 1.632 ms, min = 11.175 us, total = 22.091 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 53 total (1 active, 1 running), Execution time: mean = 2.542 ms, total = 134.711 ms, Queueing time: mean = 59.776 us, max = 172.215 us, min = 16.335 us, total = 3.168 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.394 s, total = 2998.755 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 330.012 us, total = 1.980 ms, Queueing time: mean = 84.385 us, max = 184.802 us, min = 20.320 us, total = 506.307 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 5.370 us, total = 21.480 us, Queueing time: mean = 37.229 us, max = 57.450 us, min = 34.595 us, total = 148.915 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:06:16,212 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:06:16,355 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 282988 total (35 active) [state-dump] Queueing time: mean = 12.635 ms, max = 590.169 s, min = -0.000 s, total = 3575.693 s [state-dump] Execution time: mean = 10.760 ms, total = 3044.914 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 68019 total (0 active), Execution time: mean = 33.527 us, total = 2.281 s, Queueing time: mean = 100.812 us, max = 23.460 ms, min = 2.000 us, total = 6.857 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 68019 total (0 active), Execution time: mean = 481.242 us, total = 32.734 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 32372 total (1 active), Execution time: mean = 9.868 us, total = 319.445 ms, Queueing time: mean = 82.549 us, max = 6.066 ms, min = -0.000 s, total = 2.672 s [state-dump] NodeManager.CheckGC - 32372 total (1 active), Execution time: mean = 3.116 us, total = 100.868 ms, Queueing time: mean = 88.386 us, max = 6.066 ms, min = 3.126 us, total = 2.861 s [state-dump] ObjectManager.UpdateAvailableMemory - 32371 total (0 active), Execution time: mean = 5.286 us, total = 171.129 ms, Queueing time: mean = 94.800 us, max = 1.104 ms, min = 2.197 us, total = 3.069 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 16195 total (1 active), Execution time: mean = 16.828 us, total = 272.534 ms, Queueing time: mean = 68.170 us, max = 2.889 ms, min = -0.000 s, total = 1.104 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 12935 total (1 active), Execution time: mean = 437.457 us, total = 5.659 s, Queueing time: mean = 68.375 us, max = 13.366 ms, min = 93.000 ns, total = 884.429 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3240 total (1 active), Execution time: mean = 8.689 us, total = 28.152 ms, Queueing time: mean = 171.268 us, max = 3.537 ms, min = -0.000 s, total = 554.907 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3240 total (1 active), Execution time: mean = 15.212 us, total = 49.288 ms, Queueing time: mean = 64.641 us, max = 2.658 ms, min = 7.553 us, total = 209.436 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3240 total (1 active), Execution time: mean = 3.225 us, total = 10.451 ms, Queueing time: mean = 174.902 us, max = 3.551 ms, min = 4.207 us, total = 566.684 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3239 total (0 active), Execution time: mean = 581.738 us, total = 1.884 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3239 total (0 active), Execution time: mean = 102.087 us, total = 330.660 ms, Queueing time: mean = 99.854 us, max = 901.407 us, min = 6.667 us, total = 323.426 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1081 total (1 active), Execution time: mean = 8.251 us, total = 8.919 ms, Queueing time: mean = 66.349 us, max = 496.804 us, min = 12.362 us, total = 71.723 ms [state-dump] NodeManager.deadline_timer.record_metrics - 648 total (1 active), Execution time: mean = 520.665 us, total = 337.391 ms, Queueing time: mean = 367.979 us, max = 2.197 ms, min = 8.783 us, total = 238.450 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 648 total (0 active), Execution time: mean = 1.407 ms, total = 911.687 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 648 total (1 active), Execution time: mean = 292.956 us, total = 189.836 ms, Queueing time: mean = 595.164 us, max = 2.307 ms, min = 5.323 us, total = 385.667 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 648 total (0 active), Execution time: mean = 50.202 us, total = 32.531 ms, Queueing time: mean = 98.031 us, max = 237.873 us, min = 6.906 us, total = 63.524 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 324 total (1 active), Execution time: mean = 1.706 ms, total = 552.649 ms, Queueing time: mean = 69.405 us, max = 1.632 ms, min = 11.175 us, total = 22.487 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 54 total (1 active, 1 running), Execution time: mean = 2.545 ms, total = 137.431 ms, Queueing time: mean = 60.164 us, max = 172.215 us, min = 16.335 us, total = 3.249 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.394 s, total = 2998.755 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 330.012 us, total = 1.980 ms, Queueing time: mean = 84.385 us, max = 184.802 us, min = 20.320 us, total = 506.307 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 5.370 us, total = 21.480 us, Queueing time: mean = 37.229 us, max = 57.450 us, min = 34.595 us, total = 148.915 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:07:16,213 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:07:16,358 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 288223 total (35 active) [state-dump] Queueing time: mean = 12.407 ms, max = 590.169 s, min = -0.000 s, total = 3575.906 s [state-dump] Execution time: mean = 10.566 ms, total = 3045.464 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 69279 total (0 active), Execution time: mean = 33.274 us, total = 2.305 s, Queueing time: mean = 99.656 us, max = 23.460 ms, min = 2.000 us, total = 6.904 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 69279 total (0 active), Execution time: mean = 477.477 us, total = 33.079 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 32972 total (1 active), Execution time: mean = 9.833 us, total = 324.211 ms, Queueing time: mean = 82.195 us, max = 6.066 ms, min = -0.000 s, total = 2.710 s [state-dump] NodeManager.CheckGC - 32972 total (1 active), Execution time: mean = 3.111 us, total = 102.586 ms, Queueing time: mean = 88.004 us, max = 6.066 ms, min = 3.126 us, total = 2.902 s [state-dump] ObjectManager.UpdateAvailableMemory - 32971 total (0 active), Execution time: mean = 5.253 us, total = 173.211 ms, Queueing time: mean = 93.736 us, max = 1.104 ms, min = 2.197 us, total = 3.091 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 16495 total (1 active), Execution time: mean = 16.770 us, total = 276.621 ms, Queueing time: mean = 67.788 us, max = 2.889 ms, min = -0.000 s, total = 1.118 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 13175 total (1 active), Execution time: mean = 437.102 us, total = 5.759 s, Queueing time: mean = 67.913 us, max = 13.366 ms, min = 93.000 ns, total = 894.757 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3300 total (1 active), Execution time: mean = 8.650 us, total = 28.545 ms, Queueing time: mean = 171.346 us, max = 3.537 ms, min = -0.000 s, total = 565.442 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3300 total (1 active), Execution time: mean = 15.189 us, total = 50.123 ms, Queueing time: mean = 64.399 us, max = 2.658 ms, min = 7.553 us, total = 212.517 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3300 total (1 active), Execution time: mean = 3.218 us, total = 10.620 ms, Queueing time: mean = 174.960 us, max = 3.551 ms, min = 4.207 us, total = 577.368 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3299 total (0 active), Execution time: mean = 578.024 us, total = 1.907 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3299 total (0 active), Execution time: mean = 101.693 us, total = 335.485 ms, Queueing time: mean = 98.696 us, max = 901.407 us, min = 6.667 us, total = 325.599 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1101 total (1 active), Execution time: mean = 8.219 us, total = 9.049 ms, Queueing time: mean = 65.863 us, max = 496.804 us, min = 12.362 us, total = 72.516 ms [state-dump] NodeManager.deadline_timer.record_metrics - 660 total (1 active), Execution time: mean = 520.122 us, total = 343.280 ms, Queueing time: mean = 369.360 us, max = 2.197 ms, min = 8.783 us, total = 243.777 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 660 total (0 active), Execution time: mean = 1.403 ms, total = 926.065 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 660 total (1 active), Execution time: mean = 292.841 us, total = 193.275 ms, Queueing time: mean = 595.989 us, max = 2.307 ms, min = 5.323 us, total = 393.352 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 660 total (0 active), Execution time: mean = 50.092 us, total = 33.061 ms, Queueing time: mean = 97.242 us, max = 237.873 us, min = 6.906 us, total = 64.180 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 330 total (1 active), Execution time: mean = 1.708 ms, total = 563.597 ms, Queueing time: mean = 69.117 us, max = 1.632 ms, min = 11.175 us, total = 22.809 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 55 total (1 active, 1 running), Execution time: mean = 2.554 ms, total = 140.483 ms, Queueing time: mean = 60.495 us, max = 172.215 us, min = 16.335 us, total = 3.327 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.394 s, total = 2998.755 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 330.012 us, total = 1.980 ms, Queueing time: mean = 84.385 us, max = 184.802 us, min = 20.320 us, total = 506.307 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 5.370 us, total = 21.480 us, Queueing time: mean = 37.229 us, max = 57.450 us, min = 34.595 us, total = 148.915 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:08:16,213 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:08:16,361 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 293456 total (35 active) [state-dump] Queueing time: mean = 12.186 ms, max = 590.169 s, min = -0.000 s, total = 3576.064 s [state-dump] Execution time: mean = 10.379 ms, total = 3045.913 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 70539 total (0 active), Execution time: mean = 32.980 us, total = 2.326 s, Queueing time: mean = 98.193 us, max = 23.460 ms, min = 2.000 us, total = 6.926 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 70539 total (0 active), Execution time: mean = 472.644 us, total = 33.340 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 33572 total (1 active), Execution time: mean = 9.771 us, total = 328.026 ms, Queueing time: mean = 81.609 us, max = 6.066 ms, min = -0.000 s, total = 2.740 s [state-dump] NodeManager.CheckGC - 33572 total (1 active), Execution time: mean = 3.103 us, total = 104.165 ms, Queueing time: mean = 87.369 us, max = 6.066 ms, min = 3.126 us, total = 2.933 s [state-dump] ObjectManager.UpdateAvailableMemory - 33571 total (0 active), Execution time: mean = 5.210 us, total = 174.889 ms, Queueing time: mean = 92.429 us, max = 1.104 ms, min = 2.197 us, total = 3.103 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 16794 total (1 active), Execution time: mean = 16.658 us, total = 279.749 ms, Queueing time: mean = 67.286 us, max = 2.889 ms, min = -0.000 s, total = 1.130 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 13414 total (1 active), Execution time: mean = 436.411 us, total = 5.854 s, Queueing time: mean = 67.385 us, max = 13.366 ms, min = 93.000 ns, total = 903.899 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3360 total (1 active), Execution time: mean = 8.606 us, total = 28.917 ms, Queueing time: mean = 171.664 us, max = 3.537 ms, min = -0.000 s, total = 576.791 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3360 total (1 active), Execution time: mean = 15.117 us, total = 50.793 ms, Queueing time: mean = 63.947 us, max = 2.658 ms, min = 7.553 us, total = 214.863 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3360 total (1 active), Execution time: mean = 3.209 us, total = 10.784 ms, Queueing time: mean = 175.257 us, max = 3.551 ms, min = 4.207 us, total = 588.865 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3359 total (0 active), Execution time: mean = 573.105 us, total = 1.925 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3359 total (0 active), Execution time: mean = 101.247 us, total = 340.089 ms, Queueing time: mean = 97.264 us, max = 901.407 us, min = 6.667 us, total = 326.710 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1121 total (1 active), Execution time: mean = 8.189 us, total = 9.180 ms, Queueing time: mean = 65.368 us, max = 496.804 us, min = 12.362 us, total = 73.278 ms [state-dump] NodeManager.deadline_timer.record_metrics - 672 total (1 active), Execution time: mean = 520.190 us, total = 349.568 ms, Queueing time: mean = 370.551 us, max = 2.197 ms, min = 8.783 us, total = 249.010 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 672 total (0 active), Execution time: mean = 1.397 ms, total = 938.618 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 672 total (1 active), Execution time: mean = 292.440 us, total = 196.520 ms, Queueing time: mean = 597.643 us, max = 2.307 ms, min = 5.323 us, total = 401.616 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 672 total (0 active), Execution time: mean = 49.917 us, total = 33.544 ms, Queueing time: mean = 96.048 us, max = 237.873 us, min = 6.906 us, total = 64.544 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 336 total (1 active), Execution time: mean = 1.711 ms, total = 575.038 ms, Queueing time: mean = 68.394 us, max = 1.632 ms, min = 11.175 us, total = 22.980 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 56 total (1 active, 1 running), Execution time: mean = 2.559 ms, total = 143.302 ms, Queueing time: mean = 59.706 us, max = 172.215 us, min = 16.311 us, total = 3.344 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.394 s, total = 2998.755 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 330.012 us, total = 1.980 ms, Queueing time: mean = 84.385 us, max = 184.802 us, min = 20.320 us, total = 506.307 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 5.370 us, total = 21.480 us, Queueing time: mean = 37.229 us, max = 57.450 us, min = 34.595 us, total = 148.915 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:09:16,213 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:09:16,364 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 298688 total (35 active) [state-dump] Queueing time: mean = 11.973 ms, max = 590.169 s, min = -0.000 s, total = 3576.248 s [state-dump] Execution time: mean = 10.199 ms, total = 3046.413 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 71799 total (0 active), Execution time: mean = 32.732 us, total = 2.350 s, Queueing time: mean = 96.996 us, max = 23.460 ms, min = 2.000 us, total = 6.964 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 71799 total (0 active), Execution time: mean = 468.583 us, total = 33.644 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 34171 total (1 active), Execution time: mean = 9.728 us, total = 332.424 ms, Queueing time: mean = 81.135 us, max = 6.066 ms, min = -0.000 s, total = 2.772 s [state-dump] NodeManager.CheckGC - 34171 total (1 active), Execution time: mean = 3.096 us, total = 105.807 ms, Queueing time: mean = 86.866 us, max = 6.066 ms, min = 3.126 us, total = 2.968 s [state-dump] ObjectManager.UpdateAvailableMemory - 34170 total (0 active), Execution time: mean = 5.175 us, total = 176.815 ms, Queueing time: mean = 91.392 us, max = 1.104 ms, min = 2.197 us, total = 3.123 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 17094 total (1 active), Execution time: mean = 16.568 us, total = 283.212 ms, Queueing time: mean = 66.789 us, max = 2.889 ms, min = -0.000 s, total = 1.142 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 13654 total (1 active), Execution time: mean = 435.999 us, total = 5.953 s, Queueing time: mean = 66.848 us, max = 13.366 ms, min = 93.000 ns, total = 912.742 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3420 total (1 active), Execution time: mean = 8.579 us, total = 29.339 ms, Queueing time: mean = 171.632 us, max = 3.537 ms, min = -0.000 s, total = 586.983 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3420 total (1 active), Execution time: mean = 15.039 us, total = 51.432 ms, Queueing time: mean = 63.392 us, max = 2.658 ms, min = 7.553 us, total = 216.800 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3420 total (1 active), Execution time: mean = 3.203 us, total = 10.954 ms, Queueing time: mean = 175.212 us, max = 3.551 ms, min = 4.207 us, total = 599.225 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3419 total (0 active), Execution time: mean = 568.966 us, total = 1.945 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3419 total (0 active), Execution time: mean = 100.915 us, total = 345.027 ms, Queueing time: mean = 96.141 us, max = 901.407 us, min = 6.667 us, total = 328.707 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1141 total (1 active), Execution time: mean = 8.161 us, total = 9.312 ms, Queueing time: mean = 64.873 us, max = 496.804 us, min = 12.362 us, total = 74.021 ms [state-dump] NodeManager.deadline_timer.record_metrics - 684 total (1 active), Execution time: mean = 519.610 us, total = 355.414 ms, Queueing time: mean = 370.958 us, max = 2.197 ms, min = 8.783 us, total = 253.735 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 684 total (0 active), Execution time: mean = 1.391 ms, total = 951.398 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 684 total (1 active), Execution time: mean = 292.273 us, total = 199.915 ms, Queueing time: mean = 597.704 us, max = 2.307 ms, min = 5.323 us, total = 408.830 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 684 total (0 active), Execution time: mean = 49.790 us, total = 34.057 ms, Queueing time: mean = 95.116 us, max = 237.873 us, min = 6.906 us, total = 65.059 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 342 total (1 active), Execution time: mean = 1.711 ms, total = 585.217 ms, Queueing time: mean = 67.960 us, max = 1.632 ms, min = 11.175 us, total = 23.242 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 57 total (1 active, 1 running), Execution time: mean = 2.564 ms, total = 146.153 ms, Queueing time: mean = 59.162 us, max = 172.215 us, min = 16.311 us, total = 3.372 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.394 s, total = 2998.755 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 330.012 us, total = 1.980 ms, Queueing time: mean = 84.385 us, max = 184.802 us, min = 20.320 us, total = 506.307 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 5.370 us, total = 21.480 us, Queueing time: mean = 37.229 us, max = 57.450 us, min = 34.595 us, total = 148.915 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:10:16,213 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:10:16,367 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 303922 total (35 active) [state-dump] Queueing time: mean = 11.768 ms, max = 590.169 s, min = -0.000 s, total = 3576.442 s [state-dump] Execution time: mean = 10.025 ms, total = 3046.947 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 73059 total (0 active), Execution time: mean = 32.535 us, total = 2.377 s, Queueing time: mean = 95.993 us, max = 23.460 ms, min = 2.000 us, total = 7.013 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 73059 total (0 active), Execution time: mean = 465.083 us, total = 33.979 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 34771 total (1 active), Execution time: mean = 9.695 us, total = 337.104 ms, Queueing time: mean = 80.690 us, max = 6.066 ms, min = -0.000 s, total = 2.806 s [state-dump] NodeManager.CheckGC - 34771 total (1 active), Execution time: mean = 3.092 us, total = 107.511 ms, Queueing time: mean = 86.392 us, max = 6.066 ms, min = 3.126 us, total = 3.004 s [state-dump] ObjectManager.UpdateAvailableMemory - 34770 total (0 active), Execution time: mean = 5.146 us, total = 178.941 ms, Queueing time: mean = 90.425 us, max = 1.104 ms, min = 2.197 us, total = 3.144 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 17394 total (1 active), Execution time: mean = 16.481 us, total = 286.677 ms, Queueing time: mean = 66.318 us, max = 2.889 ms, min = -0.000 s, total = 1.154 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 13893 total (1 active), Execution time: mean = 435.630 us, total = 6.052 s, Queueing time: mean = 66.422 us, max = 13.366 ms, min = 93.000 ns, total = 922.801 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3480 total (1 active), Execution time: mean = 8.543 us, total = 29.730 ms, Queueing time: mean = 171.217 us, max = 3.537 ms, min = -0.000 s, total = 595.835 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3480 total (1 active), Execution time: mean = 15.008 us, total = 52.228 ms, Queueing time: mean = 63.067 us, max = 2.658 ms, min = 7.553 us, total = 219.473 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3480 total (1 active), Execution time: mean = 3.196 us, total = 11.122 ms, Queueing time: mean = 174.779 us, max = 3.551 ms, min = 4.207 us, total = 608.230 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3479 total (0 active), Execution time: mean = 565.406 us, total = 1.967 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3479 total (0 active), Execution time: mean = 100.625 us, total = 350.073 ms, Queueing time: mean = 95.079 us, max = 901.407 us, min = 6.667 us, total = 330.781 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1161 total (1 active), Execution time: mean = 8.119 us, total = 9.426 ms, Queueing time: mean = 64.405 us, max = 496.804 us, min = 12.362 us, total = 74.774 ms [state-dump] NodeManager.deadline_timer.record_metrics - 696 total (1 active), Execution time: mean = 518.296 us, total = 360.734 ms, Queueing time: mean = 369.902 us, max = 2.197 ms, min = 8.783 us, total = 257.452 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 696 total (0 active), Execution time: mean = 1.384 ms, total = 963.433 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 696 total (1 active), Execution time: mean = 291.930 us, total = 203.183 ms, Queueing time: mean = 595.668 us, max = 2.307 ms, min = 5.323 us, total = 414.585 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 696 total (0 active), Execution time: mean = 49.586 us, total = 34.512 ms, Queueing time: mean = 93.864 us, max = 237.873 us, min = 6.906 us, total = 65.330 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 348 total (1 active), Execution time: mean = 1.707 ms, total = 594.155 ms, Queueing time: mean = 67.228 us, max = 1.632 ms, min = 11.175 us, total = 23.395 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 58 total (1 active, 1 running), Execution time: mean = 2.569 ms, total = 148.999 ms, Queueing time: mean = 58.830 us, max = 172.215 us, min = 16.311 us, total = 3.412 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.394 s, total = 2998.755 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 330.012 us, total = 1.980 ms, Queueing time: mean = 84.385 us, max = 184.802 us, min = 20.320 us, total = 506.307 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 5.370 us, total = 21.480 us, Queueing time: mean = 37.229 us, max = 57.450 us, min = 34.595 us, total = 148.915 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:11:16,214 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:11:16,369 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 309157 total (35 active) [state-dump] Queueing time: mean = 11.570 ms, max = 590.169 s, min = -0.000 s, total = 3576.852 s [state-dump] Execution time: mean = 9.859 ms, total = 3047.898 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 74319 total (0 active), Execution time: mean = 32.616 us, total = 2.424 s, Queueing time: mean = 96.308 us, max = 23.460 ms, min = 2.000 us, total = 7.157 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 74319 total (0 active), Execution time: mean = 466.505 us, total = 34.670 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 35371 total (1 active), Execution time: mean = 9.690 us, total = 342.759 ms, Queueing time: mean = 80.826 us, max = 6.066 ms, min = -0.000 s, total = 2.859 s [state-dump] NodeManager.CheckGC - 35371 total (1 active), Execution time: mean = 3.090 us, total = 109.302 ms, Queueing time: mean = 86.524 us, max = 6.066 ms, min = 3.126 us, total = 3.060 s [state-dump] ObjectManager.UpdateAvailableMemory - 35370 total (0 active), Execution time: mean = 5.159 us, total = 182.489 ms, Queueing time: mean = 90.900 us, max = 1.104 ms, min = 2.197 us, total = 3.215 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 17694 total (1 active), Execution time: mean = 16.492 us, total = 291.815 ms, Queueing time: mean = 66.442 us, max = 2.889 ms, min = -0.000 s, total = 1.176 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 14133 total (1 active), Execution time: mean = 435.822 us, total = 6.159 s, Queueing time: mean = 66.472 us, max = 13.366 ms, min = 93.000 ns, total = 939.449 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3540 total (1 active), Execution time: mean = 8.540 us, total = 30.231 ms, Queueing time: mean = 171.188 us, max = 3.537 ms, min = -0.000 s, total = 606.004 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3540 total (1 active), Execution time: mean = 15.023 us, total = 53.180 ms, Queueing time: mean = 63.188 us, max = 2.658 ms, min = 7.553 us, total = 223.685 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3540 total (1 active), Execution time: mean = 3.199 us, total = 11.326 ms, Queueing time: mean = 174.743 us, max = 3.551 ms, min = 4.207 us, total = 618.589 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3539 total (0 active), Execution time: mean = 566.667 us, total = 2.005 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3539 total (0 active), Execution time: mean = 100.717 us, total = 356.437 ms, Queueing time: mean = 95.479 us, max = 901.407 us, min = 6.667 us, total = 337.901 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1181 total (1 active), Execution time: mean = 8.121 us, total = 9.591 ms, Queueing time: mean = 64.532 us, max = 496.804 us, min = 12.362 us, total = 76.212 ms [state-dump] NodeManager.deadline_timer.record_metrics - 708 total (1 active), Execution time: mean = 518.395 us, total = 367.024 ms, Queueing time: mean = 369.609 us, max = 2.197 ms, min = 8.783 us, total = 261.683 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 708 total (0 active), Execution time: mean = 1.387 ms, total = 982.224 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 708 total (1 active), Execution time: mean = 292.360 us, total = 206.991 ms, Queueing time: mean = 595.029 us, max = 2.307 ms, min = 5.323 us, total = 421.281 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 708 total (0 active), Execution time: mean = 49.696 us, total = 35.184 ms, Queueing time: mean = 93.882 us, max = 237.873 us, min = 6.906 us, total = 66.469 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 354 total (1 active), Execution time: mean = 1.708 ms, total = 604.479 ms, Queueing time: mean = 66.848 us, max = 1.632 ms, min = 11.175 us, total = 23.664 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 59 total (1 active, 1 running), Execution time: mean = 2.576 ms, total = 151.985 ms, Queueing time: mean = 59.022 us, max = 172.215 us, min = 16.311 us, total = 3.482 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.394 s, total = 2998.755 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 330.012 us, total = 1.980 ms, Queueing time: mean = 84.385 us, max = 184.802 us, min = 20.320 us, total = 506.307 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 5.370 us, total = 21.480 us, Queueing time: mean = 37.229 us, max = 57.450 us, min = 34.595 us, total = 148.915 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:12:16,214 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:12:16,372 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 314391 total (35 active) [state-dump] Queueing time: mean = 11.378 ms, max = 590.169 s, min = -0.000 s, total = 3577.268 s [state-dump] Execution time: mean = 11.606 ms, total = 3648.849 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 75579 total (0 active), Execution time: mean = 32.712 us, total = 2.472 s, Queueing time: mean = 96.628 us, max = 23.460 ms, min = 2.000 us, total = 7.303 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 75579 total (0 active), Execution time: mean = 467.782 us, total = 35.355 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 35970 total (1 active), Execution time: mean = 9.704 us, total = 349.040 ms, Queueing time: mean = 80.982 us, max = 6.066 ms, min = -0.000 s, total = 2.913 s [state-dump] NodeManager.CheckGC - 35970 total (1 active), Execution time: mean = 3.091 us, total = 111.181 ms, Queueing time: mean = 86.692 us, max = 6.066 ms, min = 3.126 us, total = 3.118 s [state-dump] ObjectManager.UpdateAvailableMemory - 35969 total (0 active), Execution time: mean = 5.175 us, total = 186.145 ms, Queueing time: mean = 91.276 us, max = 1.104 ms, min = 2.197 us, total = 3.283 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 17994 total (1 active), Execution time: mean = 16.525 us, total = 297.352 ms, Queueing time: mean = 66.596 us, max = 2.889 ms, min = -0.000 s, total = 1.198 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 14372 total (1 active), Execution time: mean = 436.142 us, total = 6.268 s, Queueing time: mean = 66.594 us, max = 13.366 ms, min = 93.000 ns, total = 957.086 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3600 total (1 active), Execution time: mean = 8.541 us, total = 30.747 ms, Queueing time: mean = 171.468 us, max = 3.537 ms, min = -0.000 s, total = 617.286 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3600 total (1 active), Execution time: mean = 15.046 us, total = 54.166 ms, Queueing time: mean = 63.284 us, max = 2.658 ms, min = 7.553 us, total = 227.823 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3600 total (1 active), Execution time: mean = 3.204 us, total = 11.535 ms, Queueing time: mean = 175.018 us, max = 3.551 ms, min = 4.207 us, total = 630.066 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3599 total (0 active), Execution time: mean = 567.791 us, total = 2.043 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3599 total (0 active), Execution time: mean = 100.800 us, total = 362.781 ms, Queueing time: mean = 95.826 us, max = 901.407 us, min = 6.667 us, total = 344.877 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1201 total (1 active), Execution time: mean = 8.157 us, total = 9.796 ms, Queueing time: mean = 64.643 us, max = 496.804 us, min = 12.362 us, total = 77.637 ms [state-dump] NodeManager.deadline_timer.record_metrics - 720 total (1 active), Execution time: mean = 518.670 us, total = 373.442 ms, Queueing time: mean = 370.626 us, max = 2.197 ms, min = 8.783 us, total = 266.850 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 720 total (0 active), Execution time: mean = 1.392 ms, total = 1.002 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 720 total (1 active), Execution time: mean = 293.064 us, total = 211.006 ms, Queueing time: mean = 595.757 us, max = 2.307 ms, min = 5.323 us, total = 428.945 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 720 total (0 active), Execution time: mean = 49.873 us, total = 35.909 ms, Queueing time: mean = 94.074 us, max = 237.873 us, min = 6.906 us, total = 67.734 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 360 total (1 active), Execution time: mean = 1.709 ms, total = 615.228 ms, Queueing time: mean = 66.768 us, max = 1.632 ms, min = 11.175 us, total = 24.037 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 60 total (1 active, 1 running), Execution time: mean = 2.578 ms, total = 154.680 ms, Queueing time: mean = 58.898 us, max = 172.215 us, min = 16.311 us, total = 3.534 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 8 total (1 active), Execution time: mean = 449.844 s, total = 3598.755 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 7 total (0 active), Execution time: mean = 342.969 us, total = 2.401 ms, Queueing time: mean = 77.659 us, max = 184.802 us, min = 20.320 us, total = 543.614 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.253 us, total = 31.267 us, Queueing time: mean = 45.593 us, max = 79.050 us, min = 34.595 us, total = 227.965 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 06:13:16,215 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:13:16,374 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 319623 total (35 active) [state-dump] Queueing time: mean = 11.193 ms, max = 590.169 s, min = -0.000 s, total = 3577.695 s [state-dump] Execution time: mean = 11.419 ms, total = 3649.819 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 76839 total (0 active), Execution time: mean = 32.817 us, total = 2.522 s, Queueing time: mean = 97.008 us, max = 23.460 ms, min = 2.000 us, total = 7.454 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 76839 total (0 active), Execution time: mean = 469.268 us, total = 36.058 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 36569 total (1 active), Execution time: mean = 9.710 us, total = 355.093 ms, Queueing time: mean = 81.207 us, max = 6.066 ms, min = -0.000 s, total = 2.970 s [state-dump] NodeManager.CheckGC - 36569 total (1 active), Execution time: mean = 3.092 us, total = 113.058 ms, Queueing time: mean = 86.921 us, max = 6.066 ms, min = 3.126 us, total = 3.179 s [state-dump] ObjectManager.UpdateAvailableMemory - 36568 total (0 active), Execution time: mean = 5.191 us, total = 189.823 ms, Queueing time: mean = 91.704 us, max = 1.104 ms, min = 2.197 us, total = 3.353 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 18294 total (1 active), Execution time: mean = 16.543 us, total = 302.642 ms, Queueing time: mean = 66.694 us, max = 2.889 ms, min = -0.000 s, total = 1.220 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 14612 total (1 active), Execution time: mean = 436.389 us, total = 6.377 s, Queueing time: mean = 66.693 us, max = 13.366 ms, min = 93.000 ns, total = 974.524 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3660 total (1 active), Execution time: mean = 8.545 us, total = 31.273 ms, Queueing time: mean = 171.789 us, max = 3.537 ms, min = -0.000 s, total = 628.747 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3660 total (1 active), Execution time: mean = 15.053 us, total = 55.094 ms, Queueing time: mean = 63.375 us, max = 2.658 ms, min = 7.553 us, total = 231.954 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3660 total (1 active), Execution time: mean = 3.207 us, total = 11.736 ms, Queueing time: mean = 175.338 us, max = 3.551 ms, min = 4.207 us, total = 641.738 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3659 total (0 active), Execution time: mean = 568.922 us, total = 2.082 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3659 total (0 active), Execution time: mean = 100.959 us, total = 369.411 ms, Queueing time: mean = 96.070 us, max = 901.407 us, min = 6.667 us, total = 351.521 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1221 total (1 active), Execution time: mean = 8.172 us, total = 9.978 ms, Queueing time: mean = 64.814 us, max = 496.804 us, min = 12.362 us, total = 79.138 ms [state-dump] NodeManager.deadline_timer.record_metrics - 732 total (1 active), Execution time: mean = 519.510 us, total = 380.282 ms, Queueing time: mean = 371.508 us, max = 2.197 ms, min = 8.783 us, total = 271.944 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 732 total (0 active), Execution time: mean = 1.395 ms, total = 1.021 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 732 total (1 active), Execution time: mean = 293.817 us, total = 215.074 ms, Queueing time: mean = 596.661 us, max = 2.307 ms, min = 5.323 us, total = 436.756 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 732 total (0 active), Execution time: mean = 50.027 us, total = 36.620 ms, Queueing time: mean = 94.445 us, max = 237.873 us, min = 6.906 us, total = 69.134 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 366 total (1 active), Execution time: mean = 1.713 ms, total = 626.977 ms, Queueing time: mean = 66.896 us, max = 1.632 ms, min = 11.175 us, total = 24.484 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 61 total (1 active, 1 running), Execution time: mean = 2.579 ms, total = 157.310 ms, Queueing time: mean = 58.811 us, max = 172.215 us, min = 16.311 us, total = 3.587 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 8 total (1 active), Execution time: mean = 449.844 s, total = 3598.755 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 7 total (0 active), Execution time: mean = 342.969 us, total = 2.401 ms, Queueing time: mean = 77.659 us, max = 184.802 us, min = 20.320 us, total = 543.614 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.253 us, total = 31.267 us, Queueing time: mean = 45.593 us, max = 79.050 us, min = 34.595 us, total = 227.965 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:14:16,215 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:14:16,377 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 324857 total (35 active) [state-dump] Queueing time: mean = 11.014 ms, max = 590.169 s, min = -0.000 s, total = 3578.091 s [state-dump] Execution time: mean = 11.238 ms, total = 3650.731 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 78099 total (0 active), Execution time: mean = 32.883 us, total = 2.568 s, Queueing time: mean = 97.217 us, max = 23.460 ms, min = 2.000 us, total = 7.593 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 78099 total (0 active), Execution time: mean = 470.031 us, total = 36.709 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 37169 total (1 active), Execution time: mean = 9.707 us, total = 360.806 ms, Queueing time: mean = 81.271 us, max = 6.066 ms, min = -0.000 s, total = 3.021 s [state-dump] NodeManager.CheckGC - 37169 total (1 active), Execution time: mean = 3.091 us, total = 114.908 ms, Queueing time: mean = 86.984 us, max = 6.066 ms, min = 3.126 us, total = 3.233 s [state-dump] ObjectManager.UpdateAvailableMemory - 37168 total (0 active), Execution time: mean = 5.201 us, total = 193.312 ms, Queueing time: mean = 91.947 us, max = 1.104 ms, min = 2.197 us, total = 3.417 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 18594 total (1 active), Execution time: mean = 16.563 us, total = 307.964 ms, Queueing time: mean = 66.715 us, max = 2.889 ms, min = -0.000 s, total = 1.241 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 14851 total (1 active), Execution time: mean = 436.657 us, total = 6.485 s, Queueing time: mean = 66.732 us, max = 13.366 ms, min = 93.000 ns, total = 991.042 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3720 total (1 active), Execution time: mean = 8.555 us, total = 31.824 ms, Queueing time: mean = 172.202 us, max = 3.537 ms, min = -0.000 s, total = 640.591 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3720 total (1 active), Execution time: mean = 15.052 us, total = 55.994 ms, Queueing time: mean = 63.342 us, max = 2.658 ms, min = 7.553 us, total = 235.631 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3720 total (1 active), Execution time: mean = 3.210 us, total = 11.941 ms, Queueing time: mean = 175.752 us, max = 3.551 ms, min = 4.207 us, total = 653.799 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3719 total (0 active), Execution time: mean = 569.481 us, total = 2.118 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3719 total (0 active), Execution time: mean = 100.972 us, total = 375.515 ms, Queueing time: mean = 96.250 us, max = 901.407 us, min = 6.667 us, total = 357.955 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1241 total (1 active), Execution time: mean = 8.168 us, total = 10.137 ms, Queueing time: mean = 64.759 us, max = 496.804 us, min = 12.362 us, total = 80.366 ms [state-dump] NodeManager.deadline_timer.record_metrics - 744 total (1 active), Execution time: mean = 520.567 us, total = 387.302 ms, Queueing time: mean = 372.405 us, max = 2.197 ms, min = 8.783 us, total = 277.070 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 744 total (0 active), Execution time: mean = 1.401 ms, total = 1.042 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 744 total (1 active), Execution time: mean = 294.821 us, total = 219.347 ms, Queueing time: mean = 597.696 us, max = 2.307 ms, min = 5.323 us, total = 444.686 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 744 total (0 active), Execution time: mean = 50.166 us, total = 37.323 ms, Queueing time: mean = 94.807 us, max = 237.873 us, min = 6.906 us, total = 70.536 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 372 total (1 active), Execution time: mean = 1.717 ms, total = 638.539 ms, Queueing time: mean = 67.206 us, max = 1.632 ms, min = 11.175 us, total = 25.000 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 62 total (1 active, 1 running), Execution time: mean = 2.561 ms, total = 158.786 ms, Queueing time: mean = 58.686 us, max = 172.215 us, min = 16.311 us, total = 3.639 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 8 total (1 active), Execution time: mean = 449.844 s, total = 3598.755 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 7 total (0 active), Execution time: mean = 342.969 us, total = 2.401 ms, Queueing time: mean = 77.659 us, max = 184.802 us, min = 20.320 us, total = 543.614 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.253 us, total = 31.267 us, Queueing time: mean = 45.593 us, max = 79.050 us, min = 34.595 us, total = 227.965 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:15:16,215 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:15:16,380 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 330085 total (35 active) [state-dump] Queueing time: mean = 10.841 ms, max = 590.169 s, min = -0.000 s, total = 3578.508 s [state-dump] Execution time: mean = 11.063 ms, total = 3651.693 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 79358 total (0 active), Execution time: mean = 32.970 us, total = 2.616 s, Queueing time: mean = 97.515 us, max = 23.460 ms, min = 2.000 us, total = 7.739 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 79358 total (0 active), Execution time: mean = 471.310 us, total = 37.402 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 37768 total (1 active), Execution time: mean = 9.744 us, total = 367.997 ms, Queueing time: mean = 81.389 us, max = 6.066 ms, min = -0.000 s, total = 3.074 s [state-dump] NodeManager.CheckGC - 37768 total (1 active), Execution time: mean = 3.093 us, total = 116.819 ms, Queueing time: mean = 87.135 us, max = 6.066 ms, min = 3.126 us, total = 3.291 s [state-dump] ObjectManager.UpdateAvailableMemory - 37767 total (0 active), Execution time: mean = 5.217 us, total = 197.014 ms, Queueing time: mean = 92.319 us, max = 1.104 ms, min = 2.197 us, total = 3.487 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 18894 total (1 active), Execution time: mean = 16.598 us, total = 313.610 ms, Queueing time: mean = 66.856 us, max = 2.889 ms, min = -0.000 s, total = 1.263 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 15091 total (1 active), Execution time: mean = 437.026 us, total = 6.595 s, Queueing time: mean = 67.036 us, max = 13.366 ms, min = 93.000 ns, total = 1.012 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 3780 total (1 active), Execution time: mean = 8.563 us, total = 32.370 ms, Queueing time: mean = 172.352 us, max = 3.537 ms, min = -0.000 s, total = 651.489 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3780 total (1 active), Execution time: mean = 15.074 us, total = 56.978 ms, Queueing time: mean = 63.365 us, max = 2.658 ms, min = 7.553 us, total = 239.520 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3780 total (1 active), Execution time: mean = 3.211 us, total = 12.139 ms, Queueing time: mean = 175.906 us, max = 3.551 ms, min = 4.207 us, total = 664.925 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3778 total (0 active), Execution time: mean = 570.645 us, total = 2.156 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3778 total (0 active), Execution time: mean = 101.042 us, total = 381.735 ms, Queueing time: mean = 96.608 us, max = 901.407 us, min = 6.667 us, total = 364.987 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1261 total (1 active), Execution time: mean = 8.169 us, total = 10.301 ms, Queueing time: mean = 64.815 us, max = 496.804 us, min = 12.362 us, total = 81.731 ms [state-dump] NodeManager.deadline_timer.record_metrics - 756 total (1 active), Execution time: mean = 521.247 us, total = 394.063 ms, Queueing time: mean = 372.548 us, max = 2.197 ms, min = 8.783 us, total = 281.646 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 756 total (0 active), Execution time: mean = 1.405 ms, total = 1.062 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 756 total (1 active), Execution time: mean = 295.956 us, total = 223.743 ms, Queueing time: mean = 597.374 us, max = 2.307 ms, min = 5.323 us, total = 451.615 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 756 total (0 active), Execution time: mean = 50.246 us, total = 37.986 ms, Queueing time: mean = 95.264 us, max = 237.873 us, min = 6.906 us, total = 72.019 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 378 total (1 active), Execution time: mean = 1.717 ms, total = 649.192 ms, Queueing time: mean = 67.321 us, max = 1.632 ms, min = 11.175 us, total = 25.447 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 63 total (1 active, 1 running), Execution time: mean = 2.567 ms, total = 161.749 ms, Queueing time: mean = 59.169 us, max = 172.215 us, min = 16.311 us, total = 3.728 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 8 total (1 active), Execution time: mean = 449.844 s, total = 3598.755 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 7 total (0 active), Execution time: mean = 342.969 us, total = 2.401 ms, Queueing time: mean = 77.659 us, max = 184.802 us, min = 20.320 us, total = 543.614 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.253 us, total = 31.267 us, Queueing time: mean = 45.593 us, max = 79.050 us, min = 34.595 us, total = 227.965 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:16:16,215 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:16:16,382 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 335319 total (35 active) [state-dump] Queueing time: mean = 10.673 ms, max = 590.169 s, min = -0.000 s, total = 3578.918 s [state-dump] Execution time: mean = 10.893 ms, total = 3652.639 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 80618 total (0 active), Execution time: mean = 33.037 us, total = 2.663 s, Queueing time: mean = 97.775 us, max = 23.460 ms, min = 2.000 us, total = 7.882 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 80618 total (0 active), Execution time: mean = 472.457 us, total = 38.089 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 38368 total (1 active), Execution time: mean = 9.742 us, total = 373.792 ms, Queueing time: mean = 81.520 us, max = 6.066 ms, min = -0.000 s, total = 3.128 s [state-dump] NodeManager.CheckGC - 38368 total (1 active), Execution time: mean = 3.091 us, total = 118.597 ms, Queueing time: mean = 87.265 us, max = 6.066 ms, min = 3.126 us, total = 3.348 s [state-dump] ObjectManager.UpdateAvailableMemory - 38367 total (0 active), Execution time: mean = 5.227 us, total = 200.557 ms, Queueing time: mean = 92.668 us, max = 1.104 ms, min = 2.197 us, total = 3.555 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 19194 total (1 active), Execution time: mean = 16.603 us, total = 318.679 ms, Queueing time: mean = 66.909 us, max = 2.889 ms, min = -0.000 s, total = 1.284 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 15330 total (1 active), Execution time: mean = 437.114 us, total = 6.701 s, Queueing time: mean = 67.103 us, max = 13.366 ms, min = 93.000 ns, total = 1.029 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 3840 total (1 active), Execution time: mean = 8.572 us, total = 32.918 ms, Queueing time: mean = 172.461 us, max = 3.537 ms, min = -0.000 s, total = 662.249 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3840 total (1 active), Execution time: mean = 15.092 us, total = 57.955 ms, Queueing time: mean = 63.438 us, max = 2.658 ms, min = 7.553 us, total = 243.601 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3840 total (1 active), Execution time: mean = 3.213 us, total = 12.337 ms, Queueing time: mean = 176.018 us, max = 3.551 ms, min = 4.207 us, total = 675.908 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3838 total (0 active), Execution time: mean = 571.675 us, total = 2.194 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3838 total (0 active), Execution time: mean = 101.078 us, total = 387.937 ms, Queueing time: mean = 96.926 us, max = 901.407 us, min = 6.667 us, total = 372.001 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1281 total (1 active), Execution time: mean = 8.168 us, total = 10.463 ms, Queueing time: mean = 64.884 us, max = 496.804 us, min = 12.362 us, total = 83.117 ms [state-dump] NodeManager.deadline_timer.record_metrics - 768 total (1 active), Execution time: mean = 521.466 us, total = 400.486 ms, Queueing time: mean = 373.026 us, max = 2.197 ms, min = 8.783 us, total = 286.484 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 768 total (0 active), Execution time: mean = 1.408 ms, total = 1.081 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 768 total (1 active), Execution time: mean = 296.325 us, total = 227.578 ms, Queueing time: mean = 597.680 us, max = 2.307 ms, min = 5.323 us, total = 459.018 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 768 total (0 active), Execution time: mean = 50.328 us, total = 38.652 ms, Queueing time: mean = 95.676 us, max = 237.873 us, min = 6.906 us, total = 73.479 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 384 total (1 active), Execution time: mean = 1.720 ms, total = 660.331 ms, Queueing time: mean = 67.287 us, max = 1.632 ms, min = 11.175 us, total = 25.838 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 64 total (1 active, 1 running), Execution time: mean = 2.575 ms, total = 164.778 ms, Queueing time: mean = 59.498 us, max = 172.215 us, min = 16.311 us, total = 3.808 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 8 total (1 active), Execution time: mean = 449.844 s, total = 3598.755 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 7 total (0 active), Execution time: mean = 342.969 us, total = 2.401 ms, Queueing time: mean = 77.659 us, max = 184.802 us, min = 20.320 us, total = 543.614 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.253 us, total = 31.267 us, Queueing time: mean = 45.593 us, max = 79.050 us, min = 34.595 us, total = 227.965 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:17:16,216 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:17:16,386 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 340551 total (35 active) [state-dump] Queueing time: mean = 10.510 ms, max = 590.169 s, min = -0.000 s, total = 3579.311 s [state-dump] Execution time: mean = 10.728 ms, total = 3653.574 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 81878 total (0 active), Execution time: mean = 33.110 us, total = 2.711 s, Queueing time: mean = 98.058 us, max = 23.460 ms, min = 2.000 us, total = 8.029 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 81878 total (0 active), Execution time: mean = 473.526 us, total = 38.771 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 38967 total (1 active), Execution time: mean = 9.749 us, total = 379.883 ms, Queueing time: mean = 81.553 us, max = 6.066 ms, min = -0.000 s, total = 3.178 s [state-dump] NodeManager.CheckGC - 38967 total (1 active), Execution time: mean = 3.092 us, total = 120.480 ms, Queueing time: mean = 87.304 us, max = 6.066 ms, min = 3.126 us, total = 3.402 s [state-dump] ObjectManager.UpdateAvailableMemory - 38966 total (0 active), Execution time: mean = 5.229 us, total = 203.759 ms, Queueing time: mean = 92.711 us, max = 1.104 ms, min = 2.197 us, total = 3.613 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 19494 total (1 active), Execution time: mean = 16.635 us, total = 324.281 ms, Queueing time: mean = 66.970 us, max = 2.889 ms, min = -0.000 s, total = 1.306 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 15570 total (1 active), Execution time: mean = 436.937 us, total = 6.803 s, Queueing time: mean = 67.185 us, max = 13.366 ms, min = 93.000 ns, total = 1.046 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 3900 total (1 active), Execution time: mean = 8.568 us, total = 33.416 ms, Queueing time: mean = 172.486 us, max = 3.537 ms, min = -0.000 s, total = 672.694 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3900 total (1 active), Execution time: mean = 15.105 us, total = 58.908 ms, Queueing time: mean = 63.473 us, max = 2.658 ms, min = 7.553 us, total = 247.544 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3900 total (1 active), Execution time: mean = 3.215 us, total = 12.540 ms, Queueing time: mean = 176.034 us, max = 3.551 ms, min = 4.207 us, total = 686.533 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3898 total (0 active), Execution time: mean = 572.379 us, total = 2.231 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3898 total (0 active), Execution time: mean = 101.046 us, total = 393.877 ms, Queueing time: mean = 97.112 us, max = 901.407 us, min = 6.667 us, total = 378.543 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1301 total (1 active), Execution time: mean = 8.178 us, total = 10.640 ms, Queueing time: mean = 64.912 us, max = 496.804 us, min = 12.362 us, total = 84.450 ms [state-dump] NodeManager.deadline_timer.record_metrics - 780 total (1 active), Execution time: mean = 521.747 us, total = 406.963 ms, Queueing time: mean = 372.957 us, max = 2.197 ms, min = 8.783 us, total = 290.906 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 780 total (0 active), Execution time: mean = 1.409 ms, total = 1.099 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 780 total (1 active), Execution time: mean = 296.619 us, total = 231.363 ms, Queueing time: mean = 597.592 us, max = 2.307 ms, min = 5.323 us, total = 466.122 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 780 total (0 active), Execution time: mean = 50.380 us, total = 39.296 ms, Queueing time: mean = 95.883 us, max = 237.873 us, min = 6.906 us, total = 74.789 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 390 total (1 active), Execution time: mean = 1.720 ms, total = 670.767 ms, Queueing time: mean = 67.335 us, max = 1.632 ms, min = 11.175 us, total = 26.261 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 65 total (1 active, 1 running), Execution time: mean = 2.563 ms, total = 166.607 ms, Queueing time: mean = 59.607 us, max = 172.215 us, min = 16.311 us, total = 3.874 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 8 total (1 active), Execution time: mean = 449.844 s, total = 3598.755 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 7 total (0 active), Execution time: mean = 342.969 us, total = 2.401 ms, Queueing time: mean = 77.659 us, max = 184.802 us, min = 20.320 us, total = 543.614 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.253 us, total = 31.267 us, Queueing time: mean = 45.593 us, max = 79.050 us, min = 34.595 us, total = 227.965 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 06:18:16,216 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:18:16,389 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 345785 total (35 active) [state-dump] Queueing time: mean = 10.352 ms, max = 590.169 s, min = -0.000 s, total = 3579.667 s [state-dump] Execution time: mean = 10.568 ms, total = 3654.412 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 83138 total (0 active), Execution time: mean = 33.107 us, total = 2.752 s, Queueing time: mean = 98.159 us, max = 23.460 ms, min = 2.000 us, total = 8.161 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 83138 total (0 active), Execution time: mean = 473.553 us, total = 39.370 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 39567 total (1 active), Execution time: mean = 9.740 us, total = 385.377 ms, Queueing time: mean = 81.510 us, max = 6.066 ms, min = -0.000 s, total = 3.225 s [state-dump] NodeManager.CheckGC - 39567 total (1 active), Execution time: mean = 3.091 us, total = 122.307 ms, Queueing time: mean = 87.253 us, max = 6.066 ms, min = 3.126 us, total = 3.452 s [state-dump] ObjectManager.UpdateAvailableMemory - 39566 total (0 active), Execution time: mean = 5.225 us, total = 206.750 ms, Queueing time: mean = 92.686 us, max = 1.104 ms, min = 2.197 us, total = 3.667 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 19793 total (1 active), Execution time: mean = 16.640 us, total = 329.348 ms, Queueing time: mean = 66.960 us, max = 2.889 ms, min = -0.000 s, total = 1.325 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 15810 total (1 active), Execution time: mean = 436.763 us, total = 6.905 s, Queueing time: mean = 67.176 us, max = 13.366 ms, min = 93.000 ns, total = 1.062 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 3960 total (1 active), Execution time: mean = 8.572 us, total = 33.945 ms, Queueing time: mean = 171.975 us, max = 3.537 ms, min = -0.000 s, total = 681.020 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3960 total (1 active), Execution time: mean = 15.102 us, total = 59.803 ms, Queueing time: mean = 63.417 us, max = 2.658 ms, min = 7.553 us, total = 251.133 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3960 total (1 active), Execution time: mean = 3.215 us, total = 12.731 ms, Queueing time: mean = 175.526 us, max = 3.551 ms, min = 4.207 us, total = 695.081 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3958 total (0 active), Execution time: mean = 571.985 us, total = 2.264 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3958 total (0 active), Execution time: mean = 100.956 us, total = 399.586 ms, Queueing time: mean = 97.071 us, max = 901.407 us, min = 6.667 us, total = 384.206 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1321 total (1 active), Execution time: mean = 8.173 us, total = 10.796 ms, Queueing time: mean = 65.002 us, max = 496.804 us, min = 12.362 us, total = 85.868 ms [state-dump] NodeManager.deadline_timer.record_metrics - 792 total (1 active), Execution time: mean = 521.515 us, total = 413.040 ms, Queueing time: mean = 370.653 us, max = 2.197 ms, min = 8.783 us, total = 293.557 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 792 total (0 active), Execution time: mean = 1.409 ms, total = 1.116 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 792 total (1 active), Execution time: mean = 296.667 us, total = 234.960 ms, Queueing time: mean = 594.935 us, max = 2.307 ms, min = 5.323 us, total = 471.189 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 792 total (0 active), Execution time: mean = 50.315 us, total = 39.849 ms, Queueing time: mean = 95.566 us, max = 237.873 us, min = 6.906 us, total = 75.688 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 396 total (1 active), Execution time: mean = 1.715 ms, total = 679.137 ms, Queueing time: mean = 67.388 us, max = 1.632 ms, min = 11.175 us, total = 26.685 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 66 total (1 active, 1 running), Execution time: mean = 2.590 ms, total = 170.951 ms, Queueing time: mean = 59.570 us, max = 172.215 us, min = 16.311 us, total = 3.932 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 8 total (1 active), Execution time: mean = 449.844 s, total = 3598.755 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 7 total (0 active), Execution time: mean = 342.969 us, total = 2.401 ms, Queueing time: mean = 77.659 us, max = 184.802 us, min = 20.320 us, total = 543.614 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.253 us, total = 31.267 us, Queueing time: mean = 45.593 us, max = 79.050 us, min = 34.595 us, total = 227.965 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:19:16,217 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:19:16,391 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 351016 total (35 active) [state-dump] Queueing time: mean = 10.199 ms, max = 590.169 s, min = -0.000 s, total = 3580.033 s [state-dump] Execution time: mean = 10.413 ms, total = 3655.277 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 84398 total (0 active), Execution time: mean = 33.121 us, total = 2.795 s, Queueing time: mean = 98.246 us, max = 23.460 ms, min = 2.000 us, total = 8.292 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 84398 total (0 active), Execution time: mean = 473.843 us, total = 39.991 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 40166 total (1 active), Execution time: mean = 9.739 us, total = 391.175 ms, Queueing time: mean = 81.510 us, max = 6.066 ms, min = -0.000 s, total = 3.274 s [state-dump] NodeManager.CheckGC - 40166 total (1 active), Execution time: mean = 3.091 us, total = 124.142 ms, Queueing time: mean = 87.252 us, max = 6.066 ms, min = 3.126 us, total = 3.505 s [state-dump] ObjectManager.UpdateAvailableMemory - 40165 total (0 active), Execution time: mean = 5.226 us, total = 209.912 ms, Queueing time: mean = 92.753 us, max = 1.104 ms, min = 2.197 us, total = 3.725 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 20093 total (1 active), Execution time: mean = 16.651 us, total = 334.562 ms, Queueing time: mean = 66.945 us, max = 2.889 ms, min = -0.000 s, total = 1.345 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 16049 total (1 active), Execution time: mean = 436.683 us, total = 7.008 s, Queueing time: mean = 67.138 us, max = 13.366 ms, min = 93.000 ns, total = 1.077 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 4020 total (1 active), Execution time: mean = 8.568 us, total = 34.444 ms, Queueing time: mean = 171.689 us, max = 3.537 ms, min = -0.000 s, total = 690.188 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4020 total (1 active), Execution time: mean = 15.105 us, total = 60.723 ms, Queueing time: mean = 63.426 us, max = 2.658 ms, min = 7.553 us, total = 254.973 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4020 total (1 active), Execution time: mean = 3.214 us, total = 12.919 ms, Queueing time: mean = 175.237 us, max = 3.551 ms, min = 4.207 us, total = 704.451 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4018 total (0 active), Execution time: mean = 571.898 us, total = 2.298 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4018 total (0 active), Execution time: mean = 100.906 us, total = 405.439 ms, Queueing time: mean = 97.051 us, max = 901.407 us, min = 6.667 us, total = 389.953 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1341 total (1 active), Execution time: mean = 8.170 us, total = 10.956 ms, Queueing time: mean = 65.182 us, max = 496.804 us, min = 12.362 us, total = 87.409 ms [state-dump] NodeManager.deadline_timer.record_metrics - 804 total (1 active), Execution time: mean = 521.617 us, total = 419.380 ms, Queueing time: mean = 369.272 us, max = 2.197 ms, min = 8.783 us, total = 296.895 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 804 total (0 active), Execution time: mean = 1.409 ms, total = 1.133 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 804 total (1 active), Execution time: mean = 297.002 us, total = 238.790 ms, Queueing time: mean = 593.314 us, max = 2.307 ms, min = 5.323 us, total = 477.024 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 804 total (0 active), Execution time: mean = 50.366 us, total = 40.494 ms, Queueing time: mean = 95.420 us, max = 237.873 us, min = 6.906 us, total = 76.718 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 402 total (1 active), Execution time: mean = 1.713 ms, total = 688.653 ms, Queueing time: mean = 67.189 us, max = 1.632 ms, min = 11.175 us, total = 27.010 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 67 total (1 active, 1 running), Execution time: mean = 2.592 ms, total = 173.654 ms, Queueing time: mean = 60.639 us, max = 172.215 us, min = 16.311 us, total = 4.063 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 8 total (1 active), Execution time: mean = 449.844 s, total = 3598.755 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 7 total (0 active), Execution time: mean = 342.969 us, total = 2.401 ms, Queueing time: mean = 77.659 us, max = 184.802 us, min = 20.320 us, total = 543.614 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.253 us, total = 31.267 us, Queueing time: mean = 45.593 us, max = 79.050 us, min = 34.595 us, total = 227.965 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:20:16,217 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:20:16,393 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 356251 total (35 active) [state-dump] Queueing time: mean = 10.050 ms, max = 590.169 s, min = -0.000 s, total = 3580.359 s [state-dump] Execution time: mean = 10.263 ms, total = 3656.039 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 85658 total (0 active), Execution time: mean = 33.064 us, total = 2.832 s, Queueing time: mean = 98.026 us, max = 23.460 ms, min = 2.000 us, total = 8.397 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 85658 total (0 active), Execution time: mean = 473.076 us, total = 40.523 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 40766 total (1 active), Execution time: mean = 9.733 us, total = 396.790 ms, Queueing time: mean = 81.428 us, max = 6.066 ms, min = -0.000 s, total = 3.319 s [state-dump] NodeManager.CheckGC - 40766 total (1 active), Execution time: mean = 3.091 us, total = 126.015 ms, Queueing time: mean = 87.165 us, max = 6.066 ms, min = 3.126 us, total = 3.553 s [state-dump] ObjectManager.UpdateAvailableMemory - 40765 total (0 active), Execution time: mean = 5.219 us, total = 212.734 ms, Queueing time: mean = 92.592 us, max = 1.104 ms, min = 2.197 us, total = 3.775 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 20393 total (1 active), Execution time: mean = 16.655 us, total = 339.653 ms, Queueing time: mean = 66.864 us, max = 2.889 ms, min = -0.000 s, total = 1.364 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 16289 total (1 active), Execution time: mean = 436.350 us, total = 7.108 s, Queueing time: mean = 67.153 us, max = 13.366 ms, min = 93.000 ns, total = 1.094 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 4080 total (1 active), Execution time: mean = 8.573 us, total = 34.977 ms, Queueing time: mean = 171.641 us, max = 3.537 ms, min = -0.000 s, total = 700.294 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4080 total (1 active), Execution time: mean = 15.097 us, total = 61.597 ms, Queueing time: mean = 63.375 us, max = 2.658 ms, min = 7.553 us, total = 258.570 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4080 total (1 active), Execution time: mean = 3.213 us, total = 13.108 ms, Queueing time: mean = 175.191 us, max = 3.551 ms, min = 4.207 us, total = 714.781 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4078 total (0 active), Execution time: mean = 571.413 us, total = 2.330 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4078 total (0 active), Execution time: mean = 100.868 us, total = 411.341 ms, Queueing time: mean = 96.960 us, max = 901.407 us, min = 6.667 us, total = 395.404 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1361 total (1 active), Execution time: mean = 8.168 us, total = 11.116 ms, Queueing time: mean = 65.448 us, max = 496.804 us, min = 12.362 us, total = 89.075 ms [state-dump] NodeManager.deadline_timer.record_metrics - 816 total (1 active), Execution time: mean = 522.222 us, total = 426.133 ms, Queueing time: mean = 368.115 us, max = 2.197 ms, min = 8.783 us, total = 300.382 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 816 total (0 active), Execution time: mean = 1.409 ms, total = 1.150 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 816 total (1 active), Execution time: mean = 297.068 us, total = 242.407 ms, Queueing time: mean = 592.802 us, max = 2.307 ms, min = 5.323 us, total = 483.726 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 816 total (0 active), Execution time: mean = 50.395 us, total = 41.122 ms, Queueing time: mean = 95.494 us, max = 237.873 us, min = 6.906 us, total = 77.923 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 408 total (1 active), Execution time: mean = 1.711 ms, total = 698.159 ms, Queueing time: mean = 66.882 us, max = 1.632 ms, min = 11.175 us, total = 27.288 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 68 total (1 active, 1 running), Execution time: mean = 2.580 ms, total = 175.463 ms, Queueing time: mean = 61.046 us, max = 172.215 us, min = 16.311 us, total = 4.151 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 8 total (1 active), Execution time: mean = 449.844 s, total = 3598.755 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 7 total (0 active), Execution time: mean = 342.969 us, total = 2.401 ms, Queueing time: mean = 77.659 us, max = 184.802 us, min = 20.320 us, total = 543.614 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.253 us, total = 31.267 us, Queueing time: mean = 45.593 us, max = 79.050 us, min = 34.595 us, total = 227.965 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:21:16,217 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:21:16,396 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 361482 total (35 active) [state-dump] Queueing time: mean = 9.905 ms, max = 590.169 s, min = -0.000 s, total = 3580.571 s [state-dump] Execution time: mean = 10.116 ms, total = 3656.604 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 86918 total (0 active), Execution time: mean = 32.903 us, total = 2.860 s, Queueing time: mean = 97.225 us, max = 23.460 ms, min = 2.000 us, total = 8.451 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 86918 total (0 active), Execution time: mean = 470.390 us, total = 40.885 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 41365 total (1 active), Execution time: mean = 9.704 us, total = 401.413 ms, Queueing time: mean = 81.065 us, max = 6.066 ms, min = -0.000 s, total = 3.353 s [state-dump] NodeManager.CheckGC - 41365 total (1 active), Execution time: mean = 3.087 us, total = 127.703 ms, Queueing time: mean = 86.781 us, max = 6.066 ms, min = 3.126 us, total = 3.590 s [state-dump] ObjectManager.UpdateAvailableMemory - 41364 total (0 active), Execution time: mean = 5.193 us, total = 214.798 ms, Queueing time: mean = 91.879 us, max = 1.104 ms, min = 2.197 us, total = 3.800 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 20693 total (1 active), Execution time: mean = 16.620 us, total = 343.927 ms, Queueing time: mean = 66.528 us, max = 2.889 ms, min = -0.000 s, total = 1.377 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 16528 total (1 active), Execution time: mean = 435.920 us, total = 7.205 s, Queueing time: mean = 66.792 us, max = 13.366 ms, min = 93.000 ns, total = 1.104 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 4140 total (1 active), Execution time: mean = 8.552 us, total = 35.405 ms, Queueing time: mean = 171.542 us, max = 3.537 ms, min = -0.000 s, total = 710.182 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4140 total (1 active), Execution time: mean = 15.063 us, total = 62.363 ms, Queueing time: mean = 63.082 us, max = 2.658 ms, min = 7.553 us, total = 261.159 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4140 total (1 active), Execution time: mean = 3.209 us, total = 13.285 ms, Queueing time: mean = 175.081 us, max = 3.551 ms, min = 4.207 us, total = 724.834 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4138 total (0 active), Execution time: mean = 568.803 us, total = 2.354 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4138 total (0 active), Execution time: mean = 100.635 us, total = 416.429 ms, Queueing time: mean = 96.234 us, max = 901.407 us, min = 6.667 us, total = 398.214 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1381 total (1 active), Execution time: mean = 8.147 us, total = 11.252 ms, Queueing time: mean = 65.137 us, max = 496.804 us, min = 12.362 us, total = 89.954 ms [state-dump] NodeManager.deadline_timer.record_metrics - 828 total (1 active), Execution time: mean = 521.501 us, total = 431.803 ms, Queueing time: mean = 368.232 us, max = 2.197 ms, min = 8.783 us, total = 304.896 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 828 total (0 active), Execution time: mean = 1.404 ms, total = 1.163 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 828 total (1 active), Execution time: mean = 296.745 us, total = 245.705 ms, Queueing time: mean = 592.516 us, max = 2.307 ms, min = 5.323 us, total = 490.603 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 828 total (0 active), Execution time: mean = 50.275 us, total = 41.628 ms, Queueing time: mean = 94.717 us, max = 237.873 us, min = 6.906 us, total = 78.426 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 414 total (1 active), Execution time: mean = 1.711 ms, total = 708.357 ms, Queueing time: mean = 66.598 us, max = 1.632 ms, min = 11.175 us, total = 27.572 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 69 total (1 active, 1 running), Execution time: mean = 2.581 ms, total = 178.057 ms, Queueing time: mean = 61.704 us, max = 172.215 us, min = 16.311 us, total = 4.258 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 8 total (1 active), Execution time: mean = 449.844 s, total = 3598.755 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 7 total (0 active), Execution time: mean = 342.969 us, total = 2.401 ms, Queueing time: mean = 77.659 us, max = 184.802 us, min = 20.320 us, total = 543.614 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.253 us, total = 31.267 us, Queueing time: mean = 45.593 us, max = 79.050 us, min = 34.595 us, total = 227.965 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 06:22:16,218 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:22:16,399 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 366719 total (35 active) [state-dump] Queueing time: mean = 9.765 ms, max = 590.169 s, min = -0.000 s, total = 3580.975 s [state-dump] Execution time: mean = 11.610 ms, total = 4257.528 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 88178 total (0 active), Execution time: mean = 32.945 us, total = 2.905 s, Queueing time: mean = 97.455 us, max = 23.460 ms, min = 2.000 us, total = 8.593 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 88178 total (0 active), Execution time: mean = 471.299 us, total = 41.558 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 41965 total (1 active), Execution time: mean = 9.704 us, total = 407.249 ms, Queueing time: mean = 81.157 us, max = 6.066 ms, min = -0.000 s, total = 3.406 s [state-dump] NodeManager.CheckGC - 41965 total (1 active), Execution time: mean = 3.087 us, total = 129.541 ms, Queueing time: mean = 86.872 us, max = 6.066 ms, min = 3.126 us, total = 3.646 s [state-dump] ObjectManager.UpdateAvailableMemory - 41964 total (0 active), Execution time: mean = 5.199 us, total = 218.154 ms, Queueing time: mean = 92.190 us, max = 1.104 ms, min = 2.197 us, total = 3.869 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 20993 total (1 active), Execution time: mean = 16.630 us, total = 349.110 ms, Queueing time: mean = 66.692 us, max = 2.889 ms, min = -0.000 s, total = 1.400 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 16768 total (1 active), Execution time: mean = 435.618 us, total = 7.304 s, Queueing time: mean = 66.881 us, max = 13.366 ms, min = 93.000 ns, total = 1.121 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 4200 total (1 active), Execution time: mean = 8.552 us, total = 35.919 ms, Queueing time: mean = 171.238 us, max = 3.537 ms, min = -0.000 s, total = 719.199 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4200 total (1 active), Execution time: mean = 15.051 us, total = 63.214 ms, Queueing time: mean = 63.201 us, max = 2.658 ms, min = 7.553 us, total = 265.446 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4200 total (1 active), Execution time: mean = 3.210 us, total = 13.484 ms, Queueing time: mean = 174.774 us, max = 3.551 ms, min = 4.207 us, total = 734.053 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4198 total (0 active), Execution time: mean = 570.192 us, total = 2.394 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4198 total (0 active), Execution time: mean = 100.723 us, total = 422.836 ms, Queueing time: mean = 96.807 us, max = 901.407 us, min = 6.667 us, total = 406.394 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1401 total (1 active), Execution time: mean = 8.148 us, total = 11.415 ms, Queueing time: mean = 65.712 us, max = 496.804 us, min = 12.362 us, total = 92.062 ms [state-dump] NodeManager.deadline_timer.record_metrics - 840 total (1 active), Execution time: mean = 521.113 us, total = 437.735 ms, Queueing time: mean = 367.199 us, max = 2.197 ms, min = 8.783 us, total = 308.447 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 840 total (0 active), Execution time: mean = 1.405 ms, total = 1.180 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 840 total (1 active), Execution time: mean = 296.803 us, total = 249.314 ms, Queueing time: mean = 591.092 us, max = 2.307 ms, min = 5.323 us, total = 496.517 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 840 total (0 active), Execution time: mean = 50.334 us, total = 42.281 ms, Queueing time: mean = 95.197 us, max = 237.873 us, min = 6.906 us, total = 79.965 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 420 total (1 active), Execution time: mean = 1.708 ms, total = 717.511 ms, Queueing time: mean = 66.287 us, max = 1.632 ms, min = 11.175 us, total = 27.841 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 70 total (1 active, 1 running), Execution time: mean = 2.581 ms, total = 180.673 ms, Queueing time: mean = 61.975 us, max = 172.215 us, min = 16.311 us, total = 4.338 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 9 total (1 active), Execution time: mean = 466.529 s, total = 4198.757 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 8 total (0 active), Execution time: mean = 345.339 us, total = 2.763 ms, Queueing time: mean = 73.894 us, max = 184.802 us, min = 20.320 us, total = 591.153 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.253 us, total = 31.267 us, Queueing time: mean = 45.593 us, max = 79.050 us, min = 34.595 us, total = 227.965 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:23:16,218 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:23:16,402 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 371950 total (35 active) [state-dump] Queueing time: mean = 9.629 ms, max = 590.169 s, min = -0.000 s, total = 3581.367 s [state-dump] Execution time: mean = 11.449 ms, total = 4258.437 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 89438 total (0 active), Execution time: mean = 32.969 us, total = 2.949 s, Queueing time: mean = 97.684 us, max = 23.460 ms, min = 1.733 us, total = 8.737 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 89438 total (0 active), Execution time: mean = 472.002 us, total = 42.215 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 42564 total (1 active), Execution time: mean = 9.702 us, total = 412.946 ms, Queueing time: mean = 81.153 us, max = 6.066 ms, min = -0.000 s, total = 3.454 s [state-dump] NodeManager.CheckGC - 42564 total (1 active), Execution time: mean = 3.085 us, total = 131.316 ms, Queueing time: mean = 86.867 us, max = 6.066 ms, min = 3.126 us, total = 3.697 s [state-dump] ObjectManager.UpdateAvailableMemory - 42563 total (0 active), Execution time: mean = 5.201 us, total = 221.369 ms, Queueing time: mean = 92.278 us, max = 1.104 ms, min = 2.197 us, total = 3.928 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 21293 total (1 active), Execution time: mean = 16.614 us, total = 353.772 ms, Queueing time: mean = 66.732 us, max = 2.889 ms, min = -0.000 s, total = 1.421 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 17007 total (1 active), Execution time: mean = 435.551 us, total = 7.407 s, Queueing time: mean = 66.889 us, max = 13.366 ms, min = 93.000 ns, total = 1.138 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 4260 total (1 active), Execution time: mean = 8.549 us, total = 36.419 ms, Queueing time: mean = 171.378 us, max = 3.537 ms, min = -0.000 s, total = 730.072 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4260 total (1 active), Execution time: mean = 15.040 us, total = 64.072 ms, Queueing time: mean = 63.279 us, max = 2.658 ms, min = 7.553 us, total = 269.568 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4260 total (1 active), Execution time: mean = 3.208 us, total = 13.668 ms, Queueing time: mean = 174.914 us, max = 3.551 ms, min = 4.207 us, total = 745.135 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4258 total (0 active), Execution time: mean = 571.576 us, total = 2.434 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4258 total (0 active), Execution time: mean = 100.708 us, total = 428.817 ms, Queueing time: mean = 97.730 us, max = 2.573 ms, min = 6.667 us, total = 416.135 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1421 total (1 active), Execution time: mean = 8.124 us, total = 11.544 ms, Queueing time: mean = 66.048 us, max = 496.804 us, min = 12.362 us, total = 93.855 ms [state-dump] NodeManager.deadline_timer.record_metrics - 852 total (1 active), Execution time: mean = 521.082 us, total = 443.962 ms, Queueing time: mean = 367.884 us, max = 2.197 ms, min = 8.783 us, total = 313.437 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 852 total (0 active), Execution time: mean = 1.407 ms, total = 1.199 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 852 total (1 active), Execution time: mean = 296.972 us, total = 253.020 ms, Queueing time: mean = 591.635 us, max = 2.307 ms, min = 5.323 us, total = 504.073 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 852 total (0 active), Execution time: mean = 50.432 us, total = 42.968 ms, Queueing time: mean = 95.357 us, max = 237.873 us, min = 6.906 us, total = 81.244 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 426 total (1 active), Execution time: mean = 1.709 ms, total = 728.119 ms, Queueing time: mean = 66.298 us, max = 1.632 ms, min = 11.175 us, total = 28.243 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 71 total (1 active, 1 running), Execution time: mean = 2.586 ms, total = 183.607 ms, Queueing time: mean = 62.139 us, max = 172.215 us, min = 16.311 us, total = 4.412 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 9 total (1 active), Execution time: mean = 466.529 s, total = 4198.757 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 8 total (0 active), Execution time: mean = 345.339 us, total = 2.763 ms, Queueing time: mean = 73.894 us, max = 184.802 us, min = 20.320 us, total = 591.153 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.253 us, total = 31.267 us, Queueing time: mean = 45.593 us, max = 79.050 us, min = 34.595 us, total = 227.965 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 06:24:16,218 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:24:16,404 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 377185 total (35 active) [state-dump] Queueing time: mean = 9.496 ms, max = 590.169 s, min = -0.000 s, total = 3581.708 s [state-dump] Execution time: mean = 11.292 ms, total = 4259.248 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 90698 total (0 active), Execution time: mean = 32.945 us, total = 2.988 s, Queueing time: mean = 97.598 us, max = 23.460 ms, min = 1.733 us, total = 8.852 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 90698 total (0 active), Execution time: mean = 471.756 us, total = 42.787 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 43164 total (1 active), Execution time: mean = 9.695 us, total = 418.466 ms, Queueing time: mean = 81.095 us, max = 6.066 ms, min = -0.000 s, total = 3.500 s [state-dump] NodeManager.CheckGC - 43164 total (1 active), Execution time: mean = 3.085 us, total = 133.158 ms, Queueing time: mean = 86.804 us, max = 6.066 ms, min = 3.126 us, total = 3.747 s [state-dump] ObjectManager.UpdateAvailableMemory - 43163 total (0 active), Execution time: mean = 5.200 us, total = 224.463 ms, Queueing time: mean = 92.133 us, max = 1.104 ms, min = 2.197 us, total = 3.977 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 21593 total (1 active), Execution time: mean = 16.616 us, total = 358.785 ms, Queueing time: mean = 66.744 us, max = 2.889 ms, min = -0.000 s, total = 1.441 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 17247 total (1 active), Execution time: mean = 435.238 us, total = 7.507 s, Queueing time: mean = 66.771 us, max = 13.366 ms, min = 93.000 ns, total = 1.152 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 4320 total (1 active), Execution time: mean = 8.543 us, total = 36.904 ms, Queueing time: mean = 171.567 us, max = 3.537 ms, min = -0.000 s, total = 741.169 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4320 total (1 active), Execution time: mean = 15.027 us, total = 64.915 ms, Queueing time: mean = 63.165 us, max = 2.658 ms, min = 7.553 us, total = 272.874 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4320 total (1 active), Execution time: mean = 3.209 us, total = 13.861 ms, Queueing time: mean = 175.098 us, max = 3.551 ms, min = 3.611 us, total = 756.425 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4318 total (0 active), Execution time: mean = 571.527 us, total = 2.468 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4318 total (0 active), Execution time: mean = 100.642 us, total = 434.573 ms, Queueing time: mean = 97.770 us, max = 2.573 ms, min = 6.667 us, total = 422.172 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1441 total (1 active), Execution time: mean = 8.111 us, total = 11.688 ms, Queueing time: mean = 66.075 us, max = 496.804 us, min = 11.179 us, total = 95.214 ms [state-dump] NodeManager.deadline_timer.record_metrics - 864 total (1 active), Execution time: mean = 521.507 us, total = 450.582 ms, Queueing time: mean = 368.441 us, max = 2.197 ms, min = 8.783 us, total = 318.333 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 864 total (0 active), Execution time: mean = 1.408 ms, total = 1.216 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 864 total (1 active), Execution time: mean = 297.385 us, total = 256.940 ms, Queueing time: mean = 592.188 us, max = 2.307 ms, min = 5.323 us, total = 511.650 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 864 total (0 active), Execution time: mean = 50.476 us, total = 43.611 ms, Queueing time: mean = 95.230 us, max = 237.873 us, min = 6.906 us, total = 82.278 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 432 total (1 active), Execution time: mean = 1.711 ms, total = 739.028 ms, Queueing time: mean = 66.471 us, max = 1.632 ms, min = 11.175 us, total = 28.715 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 72 total (1 active, 1 running), Execution time: mean = 2.592 ms, total = 186.617 ms, Queueing time: mean = 62.307 us, max = 172.215 us, min = 16.311 us, total = 4.486 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 9 total (1 active), Execution time: mean = 466.529 s, total = 4198.757 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 8 total (0 active), Execution time: mean = 345.339 us, total = 2.763 ms, Queueing time: mean = 73.894 us, max = 184.802 us, min = 20.320 us, total = 591.153 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.253 us, total = 31.267 us, Queueing time: mean = 45.593 us, max = 79.050 us, min = 34.595 us, total = 227.965 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:25:16,218 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:25:16,407 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 382416 total (35 active) [state-dump] Queueing time: mean = 9.367 ms, max = 590.169 s, min = -0.000 s, total = 3582.013 s [state-dump] Execution time: mean = 11.140 ms, total = 4260.014 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 91958 total (0 active), Execution time: mean = 32.880 us, total = 3.024 s, Queueing time: mean = 97.402 us, max = 23.460 ms, min = 1.733 us, total = 8.957 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 91958 total (0 active), Execution time: mean = 471.164 us, total = 43.327 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 43763 total (1 active), Execution time: mean = 9.683 us, total = 423.749 ms, Queueing time: mean = 80.997 us, max = 6.066 ms, min = -0.000 s, total = 3.545 s [state-dump] NodeManager.CheckGC - 43763 total (1 active), Execution time: mean = 3.083 us, total = 134.903 ms, Queueing time: mean = 86.697 us, max = 6.066 ms, min = 3.126 us, total = 3.794 s [state-dump] ObjectManager.UpdateAvailableMemory - 43762 total (0 active), Execution time: mean = 5.190 us, total = 227.138 ms, Queueing time: mean = 91.760 us, max = 1.104 ms, min = 2.197 us, total = 4.016 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 21893 total (1 active), Execution time: mean = 16.604 us, total = 363.516 ms, Queueing time: mean = 66.731 us, max = 2.889 ms, min = -0.000 s, total = 1.461 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 17486 total (1 active), Execution time: mean = 435.080 us, total = 7.608 s, Queueing time: mean = 66.662 us, max = 13.366 ms, min = 93.000 ns, total = 1.166 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 4380 total (1 active), Execution time: mean = 8.531 us, total = 37.365 ms, Queueing time: mean = 171.149 us, max = 3.537 ms, min = -0.000 s, total = 749.633 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4380 total (1 active), Execution time: mean = 15.029 us, total = 65.827 ms, Queueing time: mean = 63.147 us, max = 2.658 ms, min = 7.553 us, total = 276.586 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4380 total (1 active), Execution time: mean = 3.209 us, total = 14.055 ms, Queueing time: mean = 174.671 us, max = 3.551 ms, min = 2.496 us, total = 765.059 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4378 total (0 active), Execution time: mean = 570.716 us, total = 2.499 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4378 total (0 active), Execution time: mean = 100.547 us, total = 440.194 ms, Queueing time: mean = 97.489 us, max = 2.573 ms, min = 6.667 us, total = 426.805 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1461 total (1 active), Execution time: mean = 8.100 us, total = 11.833 ms, Queueing time: mean = 65.829 us, max = 496.804 us, min = 11.179 us, total = 96.176 ms [state-dump] NodeManager.deadline_timer.record_metrics - 876 total (1 active), Execution time: mean = 521.140 us, total = 456.518 ms, Queueing time: mean = 366.946 us, max = 2.197 ms, min = 8.783 us, total = 321.445 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 876 total (0 active), Execution time: mean = 1.407 ms, total = 1.233 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 876 total (1 active), Execution time: mean = 297.203 us, total = 260.350 ms, Queueing time: mean = 590.366 us, max = 2.307 ms, min = 5.323 us, total = 517.161 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 876 total (0 active), Execution time: mean = 50.376 us, total = 44.130 ms, Queueing time: mean = 94.849 us, max = 237.873 us, min = 6.906 us, total = 83.088 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 438 total (1 active), Execution time: mean = 1.708 ms, total = 748.244 ms, Queueing time: mean = 66.136 us, max = 1.632 ms, min = 11.175 us, total = 28.968 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 73 total (1 active, 1 running), Execution time: mean = 2.579 ms, total = 188.300 ms, Queueing time: mean = 62.148 us, max = 172.215 us, min = 16.311 us, total = 4.537 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 9 total (1 active), Execution time: mean = 466.529 s, total = 4198.757 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 8 total (0 active), Execution time: mean = 345.339 us, total = 2.763 ms, Queueing time: mean = 73.894 us, max = 184.802 us, min = 20.320 us, total = 591.153 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.253 us, total = 31.267 us, Queueing time: mean = 45.593 us, max = 79.050 us, min = 34.595 us, total = 227.965 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:26:16,219 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:26:16,410 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 387651 total (35 active) [state-dump] Queueing time: mean = 9.241 ms, max = 590.169 s, min = -0.000 s, total = 3582.363 s [state-dump] Execution time: mean = 10.991 ms, total = 4260.847 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 93218 total (0 active), Execution time: mean = 32.867 us, total = 3.064 s, Queueing time: mean = 97.338 us, max = 23.460 ms, min = 1.733 us, total = 9.074 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 93218 total (0 active), Execution time: mean = 471.177 us, total = 43.922 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 44363 total (1 active), Execution time: mean = 9.664 us, total = 428.708 ms, Queueing time: mean = 80.925 us, max = 6.066 ms, min = -0.000 s, total = 3.590 s [state-dump] NodeManager.CheckGC - 44363 total (1 active), Execution time: mean = 3.078 us, total = 136.565 ms, Queueing time: mean = 86.609 us, max = 6.066 ms, min = 3.126 us, total = 3.842 s [state-dump] ObjectManager.UpdateAvailableMemory - 44362 total (0 active), Execution time: mean = 5.188 us, total = 230.137 ms, Queueing time: mean = 91.808 us, max = 1.104 ms, min = 2.197 us, total = 4.073 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 22193 total (1 active), Execution time: mean = 16.576 us, total = 367.879 ms, Queueing time: mean = 66.775 us, max = 2.889 ms, min = -0.000 s, total = 1.482 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 17726 total (1 active), Execution time: mean = 434.939 us, total = 7.710 s, Queueing time: mean = 66.631 us, max = 13.366 ms, min = 93.000 ns, total = 1.181 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 4440 total (1 active), Execution time: mean = 8.521 us, total = 37.833 ms, Queueing time: mean = 171.291 us, max = 3.537 ms, min = -0.000 s, total = 760.531 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4440 total (1 active), Execution time: mean = 15.024 us, total = 66.706 ms, Queueing time: mean = 63.165 us, max = 2.658 ms, min = 7.553 us, total = 280.451 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4440 total (1 active), Execution time: mean = 3.207 us, total = 14.241 ms, Queueing time: mean = 174.808 us, max = 3.551 ms, min = 2.496 us, total = 776.146 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4438 total (0 active), Execution time: mean = 570.437 us, total = 2.532 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4438 total (0 active), Execution time: mean = 100.470 us, total = 445.888 ms, Queueing time: mean = 97.348 us, max = 2.573 ms, min = 6.667 us, total = 432.033 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1481 total (1 active), Execution time: mean = 8.099 us, total = 11.994 ms, Queueing time: mean = 65.903 us, max = 496.804 us, min = 11.179 us, total = 97.602 ms [state-dump] NodeManager.deadline_timer.record_metrics - 888 total (1 active), Execution time: mean = 521.966 us, total = 463.506 ms, Queueing time: mean = 366.508 us, max = 2.197 ms, min = 8.783 us, total = 325.459 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 888 total (0 active), Execution time: mean = 1.407 ms, total = 1.250 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 888 total (1 active), Execution time: mean = 297.373 us, total = 264.067 ms, Queueing time: mean = 590.656 us, max = 2.307 ms, min = 5.323 us, total = 524.502 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 888 total (0 active), Execution time: mean = 50.409 us, total = 44.763 ms, Queueing time: mean = 94.748 us, max = 237.873 us, min = 6.906 us, total = 84.137 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 444 total (1 active), Execution time: mean = 1.708 ms, total = 758.415 ms, Queueing time: mean = 65.941 us, max = 1.632 ms, min = 11.175 us, total = 29.278 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 74 total (1 active, 1 running), Execution time: mean = 2.584 ms, total = 191.186 ms, Queueing time: mean = 61.706 us, max = 172.215 us, min = 16.311 us, total = 4.566 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 9 total (1 active), Execution time: mean = 466.529 s, total = 4198.757 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 8 total (0 active), Execution time: mean = 345.339 us, total = 2.763 ms, Queueing time: mean = 73.894 us, max = 184.802 us, min = 20.320 us, total = 591.153 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.253 us, total = 31.267 us, Queueing time: mean = 45.593 us, max = 79.050 us, min = 34.595 us, total = 227.965 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:27:16,219 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:27:16,413 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 392880 total (35 active) [state-dump] Queueing time: mean = 9.120 ms, max = 590.169 s, min = -0.000 s, total = 3582.934 s [state-dump] Execution time: mean = 10.848 ms, total = 4261.800 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 94477 total (0 active), Execution time: mean = 32.917 us, total = 3.110 s, Queueing time: mean = 97.671 us, max = 23.460 ms, min = 1.733 us, total = 9.228 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 94477 total (0 active), Execution time: mean = 472.280 us, total = 44.620 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 44962 total (1 active), Execution time: mean = 9.665 us, total = 434.570 ms, Queueing time: mean = 82.243 us, max = 55.955 ms, min = -0.000 s, total = 3.698 s [state-dump] NodeManager.CheckGC - 44962 total (1 active), Execution time: mean = 3.079 us, total = 138.417 ms, Queueing time: mean = 87.929 us, max = 55.958 ms, min = 3.126 us, total = 3.953 s [state-dump] ObjectManager.UpdateAvailableMemory - 44961 total (0 active), Execution time: mean = 5.200 us, total = 233.786 ms, Queueing time: mean = 92.108 us, max = 1.104 ms, min = 2.197 us, total = 4.141 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 22492 total (1 active), Execution time: mean = 16.592 us, total = 373.187 ms, Queueing time: mean = 68.745 us, max = 41.182 ms, min = -0.000 s, total = 1.546 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 17965 total (1 active), Execution time: mean = 435.030 us, total = 7.815 s, Queueing time: mean = 66.788 us, max = 13.366 ms, min = 93.000 ns, total = 1.200 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 4500 total (1 active), Execution time: mean = 8.527 us, total = 38.371 ms, Queueing time: mean = 171.272 us, max = 3.537 ms, min = -0.000 s, total = 770.723 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4500 total (1 active), Execution time: mean = 15.075 us, total = 67.839 ms, Queueing time: mean = 63.253 us, max = 2.658 ms, min = 7.553 us, total = 284.637 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4500 total (1 active), Execution time: mean = 3.208 us, total = 14.435 ms, Queueing time: mean = 174.791 us, max = 3.551 ms, min = 2.496 us, total = 786.557 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4498 total (0 active), Execution time: mean = 571.057 us, total = 2.569 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4498 total (0 active), Execution time: mean = 100.481 us, total = 451.963 ms, Queueing time: mean = 97.605 us, max = 2.573 ms, min = 6.667 us, total = 439.027 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1501 total (1 active), Execution time: mean = 8.116 us, total = 12.183 ms, Queueing time: mean = 66.085 us, max = 496.804 us, min = 11.179 us, total = 99.193 ms [state-dump] NodeManager.deadline_timer.record_metrics - 900 total (1 active), Execution time: mean = 521.427 us, total = 469.284 ms, Queueing time: mean = 366.922 us, max = 2.197 ms, min = 8.783 us, total = 330.229 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 900 total (0 active), Execution time: mean = 1.409 ms, total = 1.268 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 900 total (1 active), Execution time: mean = 297.620 us, total = 267.858 ms, Queueing time: mean = 590.315 us, max = 2.307 ms, min = 5.323 us, total = 531.284 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 900 total (0 active), Execution time: mean = 50.527 us, total = 45.475 ms, Queueing time: mean = 95.046 us, max = 237.873 us, min = 6.906 us, total = 85.542 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 450 total (1 active), Execution time: mean = 1.709 ms, total = 769.112 ms, Queueing time: mean = 65.900 us, max = 1.632 ms, min = 11.175 us, total = 29.655 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 75 total (1 active, 1 running), Execution time: mean = 2.585 ms, total = 193.897 ms, Queueing time: mean = 61.876 us, max = 172.215 us, min = 16.311 us, total = 4.641 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 9 total (1 active), Execution time: mean = 466.529 s, total = 4198.757 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 8 total (0 active), Execution time: mean = 345.339 us, total = 2.763 ms, Queueing time: mean = 73.894 us, max = 184.802 us, min = 20.320 us, total = 591.153 us [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 6.487 us, total = 38.923 us, Queueing time: mean = 47.721 us, max = 79.050 us, min = 34.595 us, total = 286.323 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:28:16,220 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:28:16,415 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 398112 total (35 active) [state-dump] Queueing time: mean = 9.001 ms, max = 590.169 s, min = -0.000 s, total = 3583.245 s [state-dump] Execution time: mean = 10.707 ms, total = 4262.543 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 95737 total (0 active), Execution time: mean = 32.845 us, total = 3.144 s, Queueing time: mean = 97.384 us, max = 23.460 ms, min = 1.733 us, total = 9.323 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 95737 total (0 active), Execution time: mean = 471.435 us, total = 45.134 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 45561 total (1 active), Execution time: mean = 9.657 us, total = 439.970 ms, Queueing time: mean = 82.154 us, max = 55.955 ms, min = -0.000 s, total = 3.743 s [state-dump] NodeManager.CheckGC - 45561 total (1 active), Execution time: mean = 3.075 us, total = 140.105 ms, Queueing time: mean = 87.837 us, max = 55.958 ms, min = 3.126 us, total = 4.002 s [state-dump] ObjectManager.UpdateAvailableMemory - 45560 total (0 active), Execution time: mean = 5.195 us, total = 236.694 ms, Queueing time: mean = 91.904 us, max = 1.104 ms, min = 2.197 us, total = 4.187 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 22792 total (1 active), Execution time: mean = 16.593 us, total = 378.189 ms, Queueing time: mean = 68.615 us, max = 41.182 ms, min = -0.000 s, total = 1.564 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 18205 total (1 active), Execution time: mean = 434.723 us, total = 7.914 s, Queueing time: mean = 66.696 us, max = 13.366 ms, min = 93.000 ns, total = 1.214 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 4560 total (1 active), Execution time: mean = 8.510 us, total = 38.806 ms, Queueing time: mean = 171.235 us, max = 3.537 ms, min = -0.000 s, total = 780.833 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4560 total (1 active), Execution time: mean = 15.086 us, total = 68.790 ms, Queueing time: mean = 63.206 us, max = 2.658 ms, min = 7.553 us, total = 288.220 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4560 total (1 active), Execution time: mean = 3.202 us, total = 14.602 ms, Queueing time: mean = 174.747 us, max = 3.551 ms, min = 2.496 us, total = 796.849 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4558 total (0 active), Execution time: mean = 570.620 us, total = 2.601 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4558 total (0 active), Execution time: mean = 100.392 us, total = 457.588 ms, Queueing time: mean = 97.438 us, max = 2.573 ms, min = 6.667 us, total = 444.123 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1521 total (1 active), Execution time: mean = 8.106 us, total = 12.329 ms, Queueing time: mean = 65.894 us, max = 496.804 us, min = 11.179 us, total = 100.225 ms [state-dump] NodeManager.deadline_timer.record_metrics - 912 total (1 active), Execution time: mean = 521.348 us, total = 475.469 ms, Queueing time: mean = 367.074 us, max = 2.197 ms, min = 8.783 us, total = 334.771 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 912 total (0 active), Execution time: mean = 1.410 ms, total = 1.286 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 912 total (1 active), Execution time: mean = 297.770 us, total = 271.566 ms, Queueing time: mean = 590.204 us, max = 2.307 ms, min = 5.323 us, total = 538.266 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 912 total (0 active), Execution time: mean = 50.532 us, total = 46.085 ms, Queueing time: mean = 95.183 us, max = 237.873 us, min = 6.906 us, total = 86.807 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 456 total (1 active), Execution time: mean = 1.709 ms, total = 779.479 ms, Queueing time: mean = 65.841 us, max = 1.632 ms, min = 11.175 us, total = 30.024 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 76 total (1 active, 1 running), Execution time: mean = 2.588 ms, total = 196.688 ms, Queueing time: mean = 62.983 us, max = 172.215 us, min = 16.311 us, total = 4.787 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 9 total (1 active), Execution time: mean = 466.529 s, total = 4198.757 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 8 total (0 active), Execution time: mean = 345.339 us, total = 2.763 ms, Queueing time: mean = 73.894 us, max = 184.802 us, min = 20.320 us, total = 591.153 us [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 6.487 us, total = 38.923 us, Queueing time: mean = 47.721 us, max = 79.050 us, min = 34.595 us, total = 286.323 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:29:16,220 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:29:16,418 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 403337 total (37 active) [state-dump] Queueing time: mean = 8.885 ms, max = 590.169 s, min = -0.000 s, total = 3583.563 s [state-dump] Execution time: mean = 10.570 ms, total = 4263.251 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 96992 total (1 active), Execution time: mean = 32.777 us, total = 3.179 s, Queueing time: mean = 97.089 us, max = 23.460 ms, min = 1.733 us, total = 9.417 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 96992 total (1 active), Execution time: mean = 470.379 us, total = 45.623 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 46161 total (1 active), Execution time: mean = 9.643 us, total = 445.132 ms, Queueing time: mean = 82.344 us, max = 55.955 ms, min = -0.000 s, total = 3.801 s [state-dump] NodeManager.CheckGC - 46161 total (1 active), Execution time: mean = 3.073 us, total = 141.871 ms, Queueing time: mean = 88.017 us, max = 55.958 ms, min = 3.126 us, total = 4.063 s [state-dump] ObjectManager.UpdateAvailableMemory - 46160 total (0 active), Execution time: mean = 5.184 us, total = 239.308 ms, Queueing time: mean = 91.516 us, max = 1.104 ms, min = 2.197 us, total = 4.224 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 23092 total (1 active), Execution time: mean = 16.565 us, total = 382.528 ms, Queueing time: mean = 68.396 us, max = 41.182 ms, min = -0.000 s, total = 1.579 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 18445 total (1 active), Execution time: mean = 434.470 us, total = 8.014 s, Queueing time: mean = 66.464 us, max = 13.366 ms, min = 93.000 ns, total = 1.226 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 4620 total (1 active), Execution time: mean = 8.514 us, total = 39.334 ms, Queueing time: mean = 171.251 us, max = 3.537 ms, min = -0.000 s, total = 791.182 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4620 total (1 active), Execution time: mean = 15.065 us, total = 69.602 ms, Queueing time: mean = 62.989 us, max = 2.658 ms, min = 7.553 us, total = 291.007 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4620 total (1 active), Execution time: mean = 3.202 us, total = 14.793 ms, Queueing time: mean = 174.765 us, max = 3.551 ms, min = 2.496 us, total = 807.414 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4618 total (0 active), Execution time: mean = 569.122 us, total = 2.628 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4618 total (0 active), Execution time: mean = 100.227 us, total = 462.849 ms, Queueing time: mean = 96.994 us, max = 2.573 ms, min = 6.667 us, total = 447.919 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1541 total (1 active), Execution time: mean = 8.094 us, total = 12.473 ms, Queueing time: mean = 65.648 us, max = 496.804 us, min = 11.179 us, total = 101.164 ms [state-dump] NodeManager.deadline_timer.record_metrics - 924 total (1 active), Execution time: mean = 520.807 us, total = 481.226 ms, Queueing time: mean = 367.205 us, max = 2.197 ms, min = 8.783 us, total = 339.297 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 924 total (0 active), Execution time: mean = 1.407 ms, total = 1.300 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 924 total (1 active), Execution time: mean = 297.716 us, total = 275.089 ms, Queueing time: mean = 590.055 us, max = 2.307 ms, min = 5.323 us, total = 545.211 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 924 total (0 active), Execution time: mean = 50.491 us, total = 46.653 ms, Queueing time: mean = 94.708 us, max = 237.873 us, min = 6.906 us, total = 87.510 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 462 total (1 active), Execution time: mean = 1.709 ms, total = 789.398 ms, Queueing time: mean = 65.505 us, max = 1.632 ms, min = 11.175 us, total = 30.263 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 77 total (1 active, 1 running), Execution time: mean = 2.587 ms, total = 199.230 ms, Queueing time: mean = 62.678 us, max = 172.215 us, min = 16.311 us, total = 4.826 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 9 total (1 active), Execution time: mean = 466.529 s, total = 4198.757 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 8 total (0 active), Execution time: mean = 345.339 us, total = 2.763 ms, Queueing time: mean = 73.894 us, max = 184.802 us, min = 20.320 us, total = 591.153 us [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 6.487 us, total = 38.923 us, Queueing time: mean = 47.721 us, max = 79.050 us, min = 34.595 us, total = 286.323 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:30:16,220 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:30:16,420 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 408540 total (35 active) [state-dump] Queueing time: mean = 8.773 ms, max = 590.169 s, min = -0.000 s, total = 3583.925 s [state-dump] Execution time: mean = 10.437 ms, total = 4264.023 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 98238 total (0 active), Execution time: mean = 32.730 us, total = 3.215 s, Queueing time: mean = 97.130 us, max = 23.460 ms, min = 1.733 us, total = 9.542 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 98238 total (0 active), Execution time: mean = 469.944 us, total = 46.166 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 46760 total (1 active), Execution time: mean = 9.640 us, total = 450.748 ms, Queueing time: mean = 82.289 us, max = 55.955 ms, min = -0.000 s, total = 3.848 s [state-dump] NodeManager.CheckGC - 46760 total (1 active), Execution time: mean = 3.073 us, total = 143.690 ms, Queueing time: mean = 87.959 us, max = 55.958 ms, min = 3.126 us, total = 4.113 s [state-dump] ObjectManager.UpdateAvailableMemory - 46759 total (0 active), Execution time: mean = 5.184 us, total = 242.406 ms, Queueing time: mean = 91.627 us, max = 1.104 ms, min = 2.197 us, total = 4.284 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 23392 total (1 active), Execution time: mean = 16.561 us, total = 387.394 ms, Queueing time: mean = 68.312 us, max = 41.182 ms, min = -0.000 s, total = 1.598 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 18684 total (1 active), Execution time: mean = 434.331 us, total = 8.115 s, Queueing time: mean = 66.458 us, max = 13.366 ms, min = 93.000 ns, total = 1.242 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 4680 total (1 active), Execution time: mean = 8.508 us, total = 39.818 ms, Queueing time: mean = 171.381 us, max = 3.537 ms, min = -0.000 s, total = 802.061 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4680 total (1 active), Execution time: mean = 15.061 us, total = 70.486 ms, Queueing time: mean = 63.005 us, max = 2.658 ms, min = 7.553 us, total = 294.862 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4680 total (1 active), Execution time: mean = 3.200 us, total = 14.976 ms, Queueing time: mean = 174.890 us, max = 3.551 ms, min = 2.496 us, total = 818.484 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4678 total (0 active), Execution time: mean = 567.795 us, total = 2.656 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4678 total (0 active), Execution time: mean = 100.042 us, total = 467.998 ms, Queueing time: mean = 96.777 us, max = 2.573 ms, min = 6.667 us, total = 452.721 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1561 total (1 active), Execution time: mean = 8.083 us, total = 12.618 ms, Queueing time: mean = 65.552 us, max = 496.804 us, min = 11.179 us, total = 102.326 ms [state-dump] NodeManager.deadline_timer.record_metrics - 936 total (1 active), Execution time: mean = 521.214 us, total = 487.856 ms, Queueing time: mean = 367.767 us, max = 2.197 ms, min = 8.783 us, total = 344.230 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 936 total (0 active), Execution time: mean = 1.406 ms, total = 1.316 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 936 total (1 active), Execution time: mean = 297.963 us, total = 278.894 ms, Queueing time: mean = 590.698 us, max = 2.307 ms, min = 5.323 us, total = 552.894 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 936 total (0 active), Execution time: mean = 50.523 us, total = 47.289 ms, Queueing time: mean = 94.819 us, max = 237.873 us, min = 6.906 us, total = 88.750 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 468 total (1 active), Execution time: mean = 1.710 ms, total = 800.205 ms, Queueing time: mean = 65.652 us, max = 1.632 ms, min = 11.175 us, total = 30.725 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 78 total (1 active, 1 running), Execution time: mean = 2.588 ms, total = 201.869 ms, Queueing time: mean = 62.277 us, max = 172.215 us, min = 16.311 us, total = 4.858 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 9 total (1 active), Execution time: mean = 466.529 s, total = 4198.757 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 8 total (0 active), Execution time: mean = 345.339 us, total = 2.763 ms, Queueing time: mean = 73.894 us, max = 184.802 us, min = 20.320 us, total = 591.153 us [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 6.487 us, total = 38.923 us, Queueing time: mean = 47.721 us, max = 79.050 us, min = 34.595 us, total = 286.323 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:31:16,220 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:31:16,423 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 413775 total (35 active) [state-dump] Queueing time: mean = 8.662 ms, max = 590.169 s, min = -0.000 s, total = 3584.278 s [state-dump] Execution time: mean = 10.307 ms, total = 4264.833 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 99498 total (0 active), Execution time: mean = 32.693 us, total = 3.253 s, Queueing time: mean = 97.149 us, max = 23.460 ms, min = 1.733 us, total = 9.666 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 99498 total (0 active), Execution time: mean = 469.768 us, total = 46.741 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 47360 total (1 active), Execution time: mean = 9.630 us, total = 456.081 ms, Queueing time: mean = 82.231 us, max = 55.955 ms, min = -0.000 s, total = 3.894 s [state-dump] NodeManager.CheckGC - 47360 total (1 active), Execution time: mean = 3.070 us, total = 145.405 ms, Queueing time: mean = 87.895 us, max = 55.958 ms, min = 3.126 us, total = 4.163 s [state-dump] ObjectManager.UpdateAvailableMemory - 47359 total (0 active), Execution time: mean = 5.183 us, total = 245.461 ms, Queueing time: mean = 91.636 us, max = 1.104 ms, min = 2.197 us, total = 4.340 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 23692 total (1 active), Execution time: mean = 16.566 us, total = 392.478 ms, Queueing time: mean = 68.298 us, max = 41.182 ms, min = -0.000 s, total = 1.618 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 18924 total (1 active), Execution time: mean = 434.295 us, total = 8.219 s, Queueing time: mean = 66.432 us, max = 13.366 ms, min = 93.000 ns, total = 1.257 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 4740 total (1 active), Execution time: mean = 8.510 us, total = 40.336 ms, Queueing time: mean = 171.266 us, max = 3.537 ms, min = -0.000 s, total = 811.803 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4740 total (1 active), Execution time: mean = 15.051 us, total = 71.340 ms, Queueing time: mean = 63.128 us, max = 2.658 ms, min = 7.553 us, total = 299.228 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4740 total (1 active), Execution time: mean = 3.200 us, total = 15.169 ms, Queueing time: mean = 174.777 us, max = 3.551 ms, min = 2.496 us, total = 828.441 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4738 total (0 active), Execution time: mean = 567.493 us, total = 2.689 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4738 total (0 active), Execution time: mean = 99.890 us, total = 473.277 ms, Queueing time: mean = 96.723 us, max = 2.573 ms, min = 6.667 us, total = 458.274 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1581 total (1 active), Execution time: mean = 8.096 us, total = 12.800 ms, Queueing time: mean = 65.421 us, max = 496.804 us, min = 11.179 us, total = 103.430 ms [state-dump] NodeManager.deadline_timer.record_metrics - 948 total (1 active), Execution time: mean = 521.556 us, total = 494.435 ms, Queueing time: mean = 366.883 us, max = 2.197 ms, min = 8.783 us, total = 347.806 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 948 total (0 active), Execution time: mean = 1.406 ms, total = 1.333 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 948 total (1 active), Execution time: mean = 298.217 us, total = 282.710 ms, Queueing time: mean = 589.783 us, max = 2.307 ms, min = 5.323 us, total = 559.115 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 948 total (0 active), Execution time: mean = 50.495 us, total = 47.870 ms, Queueing time: mean = 94.366 us, max = 237.873 us, min = 6.906 us, total = 89.459 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 474 total (1 active), Execution time: mean = 1.709 ms, total = 810.010 ms, Queueing time: mean = 65.661 us, max = 1.632 ms, min = 11.175 us, total = 31.123 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 79 total (1 active, 1 running), Execution time: mean = 2.582 ms, total = 203.994 ms, Queueing time: mean = 61.916 us, max = 172.215 us, min = 16.311 us, total = 4.891 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 9 total (1 active), Execution time: mean = 466.529 s, total = 4198.757 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 8 total (0 active), Execution time: mean = 345.339 us, total = 2.763 ms, Queueing time: mean = 73.894 us, max = 184.802 us, min = 20.320 us, total = 591.153 us [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 6.487 us, total = 38.923 us, Queueing time: mean = 47.721 us, max = 79.050 us, min = 34.595 us, total = 286.323 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:32:16,221 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:32:16,426 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 419008 total (35 active) [state-dump] Queueing time: mean = 8.555 ms, max = 590.169 s, min = -0.000 s, total = 3584.570 s [state-dump] Execution time: mean = 11.612 ms, total = 4865.536 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 100758 total (0 active), Execution time: mean = 32.612 us, total = 3.286 s, Queueing time: mean = 96.828 us, max = 23.460 ms, min = 1.733 us, total = 9.756 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 100758 total (0 active), Execution time: mean = 468.646 us, total = 47.220 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 47959 total (1 active), Execution time: mean = 9.618 us, total = 461.281 ms, Queueing time: mean = 82.082 us, max = 55.955 ms, min = -0.000 s, total = 3.937 s [state-dump] NodeManager.CheckGC - 47959 total (1 active), Execution time: mean = 3.067 us, total = 147.084 ms, Queueing time: mean = 87.738 us, max = 55.958 ms, min = 3.126 us, total = 4.208 s [state-dump] ObjectManager.UpdateAvailableMemory - 47958 total (0 active), Execution time: mean = 5.175 us, total = 248.189 ms, Queueing time: mean = 91.360 us, max = 1.104 ms, min = 2.197 us, total = 4.381 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 23992 total (1 active), Execution time: mean = 16.553 us, total = 397.134 ms, Queueing time: mean = 68.191 us, max = 41.182 ms, min = -0.000 s, total = 1.636 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 19163 total (1 active), Execution time: mean = 434.165 us, total = 8.320 s, Queueing time: mean = 66.260 us, max = 13.366 ms, min = 93.000 ns, total = 1.270 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 4800 total (1 active), Execution time: mean = 8.499 us, total = 40.795 ms, Queueing time: mean = 171.296 us, max = 3.537 ms, min = -0.000 s, total = 822.222 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4800 total (1 active), Execution time: mean = 15.021 us, total = 72.101 ms, Queueing time: mean = 62.935 us, max = 2.658 ms, min = 7.553 us, total = 302.089 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4800 total (1 active), Execution time: mean = 3.196 us, total = 15.340 ms, Queueing time: mean = 174.800 us, max = 3.551 ms, min = 2.496 us, total = 839.042 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4798 total (0 active), Execution time: mean = 566.322 us, total = 2.717 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4798 total (0 active), Execution time: mean = 99.728 us, total = 478.493 ms, Queueing time: mean = 96.394 us, max = 2.573 ms, min = 6.667 us, total = 462.498 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1601 total (1 active), Execution time: mean = 8.092 us, total = 12.956 ms, Queueing time: mean = 65.273 us, max = 496.804 us, min = 11.179 us, total = 104.502 ms [state-dump] NodeManager.deadline_timer.record_metrics - 960 total (1 active), Execution time: mean = 521.629 us, total = 500.764 ms, Queueing time: mean = 367.071 us, max = 2.197 ms, min = 8.783 us, total = 352.388 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 960 total (0 active), Execution time: mean = 1.404 ms, total = 1.348 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 960 total (1 active), Execution time: mean = 298.152 us, total = 286.226 ms, Queueing time: mean = 590.090 us, max = 2.307 ms, min = 5.323 us, total = 566.487 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 960 total (0 active), Execution time: mean = 50.433 us, total = 48.416 ms, Queueing time: mean = 94.116 us, max = 237.873 us, min = 6.906 us, total = 90.351 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 480 total (1 active), Execution time: mean = 1.710 ms, total = 820.838 ms, Queueing time: mean = 65.565 us, max = 1.632 ms, min = 11.175 us, total = 31.471 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 80 total (1 active, 1 running), Execution time: mean = 2.587 ms, total = 206.924 ms, Queueing time: mean = 62.027 us, max = 172.215 us, min = 16.311 us, total = 4.962 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 10 total (1 active), Execution time: mean = 479.876 s, total = 4798.758 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 9 total (0 active), Execution time: mean = 345.521 us, total = 3.110 ms, Queueing time: mean = 69.585 us, max = 184.802 us, min = 20.320 us, total = 626.263 us [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 6.487 us, total = 38.923 us, Queueing time: mean = 47.721 us, max = 79.050 us, min = 34.595 us, total = 286.323 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:33:16,221 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:33:16,429 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 424243 total (35 active) [state-dump] Queueing time: mean = 8.450 ms, max = 590.169 s, min = -0.000 s, total = 3584.759 s [state-dump] Execution time: mean = 11.470 ms, total = 4866.068 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 102018 total (0 active), Execution time: mean = 32.448 us, total = 3.310 s, Queueing time: mean = 96.077 us, max = 23.460 ms, min = 1.733 us, total = 9.802 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 102018 total (0 active), Execution time: mean = 466.130 us, total = 47.554 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 48559 total (1 active), Execution time: mean = 9.602 us, total = 466.243 ms, Queueing time: mean = 81.767 us, max = 55.955 ms, min = -0.000 s, total = 3.971 s [state-dump] NodeManager.CheckGC - 48559 total (1 active), Execution time: mean = 3.065 us, total = 148.819 ms, Queueing time: mean = 87.411 us, max = 55.958 ms, min = 3.126 us, total = 4.245 s [state-dump] ObjectManager.UpdateAvailableMemory - 48558 total (0 active), Execution time: mean = 5.154 us, total = 250.273 ms, Queueing time: mean = 90.660 us, max = 1.104 ms, min = 2.197 us, total = 4.402 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 24292 total (1 active), Execution time: mean = 16.500 us, total = 400.821 ms, Queueing time: mean = 67.817 us, max = 41.182 ms, min = -0.000 s, total = 1.647 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 19403 total (1 active), Execution time: mean = 433.912 us, total = 8.419 s, Queueing time: mean = 65.963 us, max = 13.366 ms, min = 93.000 ns, total = 1.280 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 4860 total (1 active), Execution time: mean = 8.471 us, total = 41.171 ms, Queueing time: mean = 170.895 us, max = 3.537 ms, min = -0.000 s, total = 830.550 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4860 total (1 active), Execution time: mean = 14.982 us, total = 72.812 ms, Queueing time: mean = 62.576 us, max = 2.658 ms, min = 7.553 us, total = 304.119 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4860 total (1 active), Execution time: mean = 3.191 us, total = 15.508 ms, Queueing time: mean = 174.386 us, max = 3.551 ms, min = 2.496 us, total = 847.515 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4858 total (0 active), Execution time: mean = 563.851 us, total = 2.739 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4858 total (0 active), Execution time: mean = 99.501 us, total = 483.374 ms, Queueing time: mean = 95.578 us, max = 2.573 ms, min = 6.667 us, total = 464.319 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1621 total (1 active), Execution time: mean = 8.081 us, total = 13.099 ms, Queueing time: mean = 65.151 us, max = 496.804 us, min = 11.179 us, total = 105.610 ms [state-dump] NodeManager.deadline_timer.record_metrics - 972 total (1 active), Execution time: mean = 520.790 us, total = 506.208 ms, Queueing time: mean = 365.856 us, max = 2.197 ms, min = 8.783 us, total = 355.612 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 972 total (0 active), Execution time: mean = 1.400 ms, total = 1.361 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 972 total (1 active), Execution time: mean = 297.875 us, total = 289.534 ms, Queueing time: mean = 588.310 us, max = 2.307 ms, min = 5.323 us, total = 571.837 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 972 total (0 active), Execution time: mean = 50.307 us, total = 48.899 ms, Queueing time: mean = 93.330 us, max = 237.873 us, min = 6.906 us, total = 90.717 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 486 total (1 active), Execution time: mean = 1.707 ms, total = 829.655 ms, Queueing time: mean = 65.068 us, max = 1.632 ms, min = 11.175 us, total = 31.623 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 81 total (1 active, 1 running), Execution time: mean = 2.590 ms, total = 209.773 ms, Queueing time: mean = 61.521 us, max = 172.215 us, min = 16.311 us, total = 4.983 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 10 total (1 active), Execution time: mean = 479.876 s, total = 4798.758 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 9 total (0 active), Execution time: mean = 345.521 us, total = 3.110 ms, Queueing time: mean = 69.585 us, max = 184.802 us, min = 20.320 us, total = 626.263 us [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 6.487 us, total = 38.923 us, Queueing time: mean = 47.721 us, max = 79.050 us, min = 34.595 us, total = 286.323 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:34:16,221 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:34:16,431 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 429477 total (35 active) [state-dump] Queueing time: mean = 8.347 ms, max = 590.169 s, min = -0.000 s, total = 3584.921 s [state-dump] Execution time: mean = 11.331 ms, total = 4866.563 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 103278 total (0 active), Execution time: mean = 32.274 us, total = 3.333 s, Queueing time: mean = 95.284 us, max = 23.460 ms, min = 1.733 us, total = 9.841 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 103278 total (0 active), Execution time: mean = 463.432 us, total = 47.862 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 49159 total (1 active), Execution time: mean = 9.562 us, total = 470.058 ms, Queueing time: mean = 81.325 us, max = 55.955 ms, min = -0.000 s, total = 3.998 s [state-dump] NodeManager.CheckGC - 49159 total (1 active), Execution time: mean = 3.060 us, total = 150.407 ms, Queueing time: mean = 86.939 us, max = 55.958 ms, min = 3.126 us, total = 4.274 s [state-dump] ObjectManager.UpdateAvailableMemory - 49158 total (0 active), Execution time: mean = 5.128 us, total = 252.061 ms, Queueing time: mean = 89.814 us, max = 1.104 ms, min = 2.197 us, total = 4.415 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 24592 total (1 active), Execution time: mean = 16.427 us, total = 403.968 ms, Queueing time: mean = 67.386 us, max = 41.182 ms, min = -0.000 s, total = 1.657 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 19642 total (1 active), Execution time: mean = 433.512 us, total = 8.515 s, Queueing time: mean = 65.616 us, max = 13.366 ms, min = 93.000 ns, total = 1.289 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 4920 total (1 active), Execution time: mean = 8.455 us, total = 41.599 ms, Queueing time: mean = 170.714 us, max = 3.537 ms, min = -0.000 s, total = 839.911 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4920 total (1 active), Execution time: mean = 14.946 us, total = 73.535 ms, Queueing time: mean = 62.259 us, max = 2.658 ms, min = 7.553 us, total = 306.315 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4920 total (1 active), Execution time: mean = 3.186 us, total = 15.675 ms, Queueing time: mean = 174.196 us, max = 3.551 ms, min = 2.496 us, total = 857.045 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4918 total (0 active), Execution time: mean = 560.700 us, total = 2.758 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4918 total (0 active), Execution time: mean = 99.224 us, total = 487.984 ms, Queueing time: mean = 94.680 us, max = 2.573 ms, min = 6.667 us, total = 465.635 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1641 total (1 active), Execution time: mean = 8.063 us, total = 13.231 ms, Queueing time: mean = 64.844 us, max = 496.804 us, min = 11.179 us, total = 106.409 ms [state-dump] NodeManager.deadline_timer.record_metrics - 984 total (1 active), Execution time: mean = 520.128 us, total = 511.806 ms, Queueing time: mean = 365.514 us, max = 2.197 ms, min = 6.917 us, total = 359.666 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 984 total (0 active), Execution time: mean = 1.395 ms, total = 1.372 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 984 total (1 active), Execution time: mean = 297.378 us, total = 292.620 ms, Queueing time: mean = 587.821 us, max = 2.307 ms, min = 5.323 us, total = 578.416 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 984 total (0 active), Execution time: mean = 50.178 us, total = 49.375 ms, Queueing time: mean = 92.462 us, max = 237.873 us, min = 6.906 us, total = 90.983 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 492 total (1 active), Execution time: mean = 1.705 ms, total = 838.821 ms, Queueing time: mean = 64.724 us, max = 1.632 ms, min = 11.175 us, total = 31.844 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 82 total (1 active, 1 running), Execution time: mean = 2.590 ms, total = 212.376 ms, Queueing time: mean = 61.143 us, max = 172.215 us, min = 16.311 us, total = 5.014 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 10 total (1 active), Execution time: mean = 479.876 s, total = 4798.758 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 9 total (0 active), Execution time: mean = 345.521 us, total = 3.110 ms, Queueing time: mean = 69.585 us, max = 184.802 us, min = 20.320 us, total = 626.263 us [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 6.487 us, total = 38.923 us, Queueing time: mean = 47.721 us, max = 79.050 us, min = 34.595 us, total = 286.323 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 06:35:16,221 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:35:16,434 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 434709 total (35 active) [state-dump] Queueing time: mean = 8.247 ms, max = 590.169 s, min = -0.000 s, total = 3585.171 s [state-dump] Execution time: mean = 11.196 ms, total = 4867.202 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 104538 total (0 active), Execution time: mean = 32.156 us, total = 3.362 s, Queueing time: mean = 94.850 us, max = 23.460 ms, min = 1.733 us, total = 9.915 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 104538 total (0 active), Execution time: mean = 461.975 us, total = 48.294 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 49758 total (1 active), Execution time: mean = 9.542 us, total = 474.807 ms, Queueing time: mean = 81.115 us, max = 55.955 ms, min = -0.000 s, total = 4.036 s [state-dump] NodeManager.CheckGC - 49758 total (1 active), Execution time: mean = 3.057 us, total = 152.124 ms, Queueing time: mean = 86.713 us, max = 55.958 ms, min = 3.126 us, total = 4.315 s [state-dump] ObjectManager.UpdateAvailableMemory - 49757 total (0 active), Execution time: mean = 5.115 us, total = 254.526 ms, Queueing time: mean = 89.391 us, max = 1.104 ms, min = 2.197 us, total = 4.448 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 24892 total (1 active), Execution time: mean = 16.405 us, total = 408.354 ms, Queueing time: mean = 67.269 us, max = 41.182 ms, min = -0.000 s, total = 1.674 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 19882 total (1 active), Execution time: mean = 433.290 us, total = 8.615 s, Queueing time: mean = 65.388 us, max = 13.366 ms, min = 93.000 ns, total = 1.300 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 4980 total (1 active), Execution time: mean = 8.449 us, total = 42.075 ms, Queueing time: mean = 170.515 us, max = 3.537 ms, min = -0.000 s, total = 849.162 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4980 total (1 active), Execution time: mean = 14.916 us, total = 74.284 ms, Queueing time: mean = 61.941 us, max = 2.658 ms, min = 7.553 us, total = 308.467 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4980 total (1 active), Execution time: mean = 3.183 us, total = 15.852 ms, Queueing time: mean = 173.995 us, max = 3.551 ms, min = 2.496 us, total = 866.493 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4978 total (0 active), Execution time: mean = 558.788 us, total = 2.782 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4978 total (0 active), Execution time: mean = 99.069 us, total = 493.164 ms, Queueing time: mean = 94.122 us, max = 2.573 ms, min = 6.667 us, total = 468.541 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1661 total (1 active), Execution time: mean = 8.051 us, total = 13.372 ms, Queueing time: mean = 64.607 us, max = 496.804 us, min = 11.179 us, total = 107.313 ms [state-dump] NodeManager.deadline_timer.record_metrics - 996 total (1 active), Execution time: mean = 519.803 us, total = 517.724 ms, Queueing time: mean = 364.568 us, max = 2.197 ms, min = 6.917 us, total = 363.110 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 996 total (0 active), Execution time: mean = 1.392 ms, total = 1.386 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 996 total (1 active), Execution time: mean = 297.086 us, total = 295.898 ms, Queueing time: mean = 586.896 us, max = 2.307 ms, min = 5.323 us, total = 584.548 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 996 total (0 active), Execution time: mean = 50.112 us, total = 49.911 ms, Queueing time: mean = 91.945 us, max = 237.873 us, min = 6.906 us, total = 91.577 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 498 total (1 active), Execution time: mean = 1.702 ms, total = 847.794 ms, Queueing time: mean = 64.653 us, max = 1.632 ms, min = 11.175 us, total = 32.197 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 83 total (1 active, 1 running), Execution time: mean = 2.592 ms, total = 215.145 ms, Queueing time: mean = 60.709 us, max = 172.215 us, min = 16.311 us, total = 5.039 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 10 total (1 active), Execution time: mean = 479.876 s, total = 4798.758 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 9 total (0 active), Execution time: mean = 345.521 us, total = 3.110 ms, Queueing time: mean = 69.585 us, max = 184.802 us, min = 20.320 us, total = 626.263 us [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 6.487 us, total = 38.923 us, Queueing time: mean = 47.721 us, max = 79.050 us, min = 34.595 us, total = 286.323 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 06:36:16,221 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:36:16,437 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 439942 total (35 active) [state-dump] Queueing time: mean = 8.150 ms, max = 590.169 s, min = -0.000 s, total = 3585.586 s [state-dump] Execution time: mean = 11.065 ms, total = 4868.159 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 105798 total (0 active), Execution time: mean = 32.203 us, total = 3.407 s, Queueing time: mean = 95.117 us, max = 23.460 ms, min = 1.733 us, total = 10.063 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 105798 total (0 active), Execution time: mean = 463.086 us, total = 48.994 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 50358 total (1 active), Execution time: mean = 9.554 us, total = 481.140 ms, Queueing time: mean = 81.263 us, max = 55.955 ms, min = -0.000 s, total = 4.092 s [state-dump] NodeManager.CheckGC - 50358 total (1 active), Execution time: mean = 3.058 us, total = 153.999 ms, Queueing time: mean = 86.873 us, max = 55.958 ms, min = 3.126 us, total = 4.375 s [state-dump] ObjectManager.UpdateAvailableMemory - 50357 total (0 active), Execution time: mean = 5.124 us, total = 258.024 ms, Queueing time: mean = 89.536 us, max = 1.104 ms, min = 2.197 us, total = 4.509 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 25191 total (1 active), Execution time: mean = 16.435 us, total = 414.024 ms, Queueing time: mean = 67.331 us, max = 41.182 ms, min = -0.000 s, total = 1.696 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 20121 total (1 active), Execution time: mean = 433.465 us, total = 8.722 s, Queueing time: mean = 65.473 us, max = 13.366 ms, min = 93.000 ns, total = 1.317 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5040 total (1 active), Execution time: mean = 8.454 us, total = 42.610 ms, Queueing time: mean = 170.731 us, max = 3.537 ms, min = -0.000 s, total = 860.482 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 5040 total (1 active), Execution time: mean = 14.938 us, total = 75.286 ms, Queueing time: mean = 62.027 us, max = 2.658 ms, min = 7.553 us, total = 312.615 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5040 total (1 active), Execution time: mean = 3.183 us, total = 16.040 ms, Queueing time: mean = 174.213 us, max = 3.551 ms, min = 2.496 us, total = 878.035 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5038 total (0 active), Execution time: mean = 559.566 us, total = 2.819 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5038 total (0 active), Execution time: mean = 99.074 us, total = 499.137 ms, Queueing time: mean = 94.309 us, max = 2.573 ms, min = 6.667 us, total = 475.127 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1681 total (1 active), Execution time: mean = 8.067 us, total = 13.560 ms, Queueing time: mean = 64.779 us, max = 496.804 us, min = 11.179 us, total = 108.893 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1008 total (1 active), Execution time: mean = 520.013 us, total = 524.173 ms, Queueing time: mean = 365.541 us, max = 2.197 ms, min = 6.917 us, total = 368.466 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1008 total (0 active), Execution time: mean = 1.393 ms, total = 1.404 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 1008 total (1 active), Execution time: mean = 297.317 us, total = 299.696 ms, Queueing time: mean = 587.849 us, max = 2.307 ms, min = 5.323 us, total = 592.552 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1008 total (0 active), Execution time: mean = 50.215 us, total = 50.617 ms, Queueing time: mean = 92.245 us, max = 237.873 us, min = 6.906 us, total = 92.983 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 504 total (1 active), Execution time: mean = 1.704 ms, total = 858.582 ms, Queueing time: mean = 64.894 us, max = 1.632 ms, min = 11.175 us, total = 32.707 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 84 total (1 active, 1 running), Execution time: mean = 2.595 ms, total = 218.003 ms, Queueing time: mean = 60.960 us, max = 172.215 us, min = 16.311 us, total = 5.121 ms [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 10 total (1 active), Execution time: mean = 479.876 s, total = 4798.758 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 9 total (0 active), Execution time: mean = 345.521 us, total = 3.110 ms, Queueing time: mean = 69.585 us, max = 184.802 us, min = 20.320 us, total = 626.263 us [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 6.487 us, total = 38.923 us, Queueing time: mean = 47.721 us, max = 79.050 us, min = 34.595 us, total = 286.323 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 06:37:16,222 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:37:16,440 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 445174 total (35 active) [state-dump] Queueing time: mean = 8.055 ms, max = 590.169 s, min = -0.000 s, total = 3585.898 s [state-dump] Execution time: mean = 10.937 ms, total = 4868.882 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 107058 total (0 active), Execution time: mean = 32.135 us, total = 3.440 s, Queueing time: mean = 94.957 us, max = 23.460 ms, min = 1.733 us, total = 10.166 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 107058 total (0 active), Execution time: mean = 462.324 us, total = 49.496 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 50957 total (1 active), Execution time: mean = 9.550 us, total = 486.651 ms, Queueing time: mean = 81.166 us, max = 55.955 ms, min = -0.000 s, total = 4.136 s [state-dump] NodeManager.CheckGC - 50957 total (1 active), Execution time: mean = 3.057 us, total = 155.796 ms, Queueing time: mean = 86.773 us, max = 55.958 ms, min = 3.126 us, total = 4.422 s [state-dump] ObjectManager.UpdateAvailableMemory - 50956 total (0 active), Execution time: mean = 5.117 us, total = 260.747 ms, Queueing time: mean = 89.326 us, max = 1.104 ms, min = 2.197 us, total = 4.552 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 25491 total (1 active), Execution time: mean = 16.420 us, total = 418.563 ms, Queueing time: mean = 67.232 us, max = 41.182 ms, min = -0.000 s, total = 1.714 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 20361 total (1 active), Execution time: mean = 433.205 us, total = 8.820 s, Queueing time: mean = 65.497 us, max = 13.366 ms, min = 93.000 ns, total = 1.334 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5100 total (1 active), Execution time: mean = 8.450 us, total = 43.096 ms, Queueing time: mean = 170.713 us, max = 3.537 ms, min = -0.000 s, total = 870.634 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 5100 total (1 active), Execution time: mean = 14.927 us, total = 76.126 ms, Queueing time: mean = 61.950 us, max = 2.658 ms, min = 7.553 us, total = 315.943 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5100 total (1 active), Execution time: mean = 3.186 us, total = 16.251 ms, Queueing time: mean = 174.187 us, max = 3.551 ms, min = 2.496 us, total = 888.352 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5098 total (0 active), Execution time: mean = 558.533 us, total = 2.847 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5098 total (0 active), Execution time: mean = 98.921 us, total = 504.297 ms, Queueing time: mean = 94.201 us, max = 2.573 ms, min = 6.667 us, total = 480.234 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1701 total (1 active), Execution time: mean = 8.063 us, total = 13.715 ms, Queueing time: mean = 64.692 us, max = 496.804 us, min = 11.179 us, total = 110.040 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1020 total (1 active), Execution time: mean = 520.346 us, total = 530.753 ms, Queueing time: mean = 365.147 us, max = 2.197 ms, min = 6.917 us, total = 372.450 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1020 total (0 active), Execution time: mean = 1.392 ms, total = 1.420 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 1020 total (1 active), Execution time: mean = 297.296 us, total = 303.242 ms, Queueing time: mean = 587.749 us, max = 2.307 ms, min = 5.323 us, total = 599.504 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1020 total (0 active), Execution time: mean = 50.218 us, total = 51.223 ms, Queueing time: mean = 92.300 us, max = 237.873 us, min = 6.906 us, total = 94.146 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 510 total (1 active), Execution time: mean = 1.704 ms, total = 869.112 ms, Queueing time: mean = 64.788 us, max = 1.632 ms, min = 11.175 us, total = 33.042 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 85 total (1 active, 1 running), Execution time: mean = 2.597 ms, total = 220.717 ms, Queueing time: mean = 61.106 us, max = 172.215 us, min = 16.311 us, total = 5.194 ms [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 10 total (1 active), Execution time: mean = 479.876 s, total = 4798.758 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 9 total (0 active), Execution time: mean = 345.521 us, total = 3.110 ms, Queueing time: mean = 69.585 us, max = 184.802 us, min = 20.320 us, total = 626.263 us [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 6.487 us, total = 38.923 us, Queueing time: mean = 47.721 us, max = 79.050 us, min = 34.595 us, total = 286.323 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:38:16,222 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:38:16,441 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 450409 total (35 active) [state-dump] Queueing time: mean = 7.962 ms, max = 590.169 s, min = -0.000 s, total = 3586.189 s [state-dump] Execution time: mean = 10.811 ms, total = 4869.575 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 108318 total (0 active), Execution time: mean = 32.070 us, total = 3.474 s, Queueing time: mean = 94.714 us, max = 23.460 ms, min = 1.733 us, total = 10.259 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 108318 total (0 active), Execution time: mean = 461.280 us, total = 49.965 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 51557 total (1 active), Execution time: mean = 9.535 us, total = 491.603 ms, Queueing time: mean = 80.989 us, max = 55.955 ms, min = -0.000 s, total = 4.176 s [state-dump] NodeManager.CheckGC - 51557 total (1 active), Execution time: mean = 3.056 us, total = 157.551 ms, Queueing time: mean = 86.584 us, max = 55.958 ms, min = 3.126 us, total = 4.464 s [state-dump] ObjectManager.UpdateAvailableMemory - 51556 total (0 active), Execution time: mean = 5.107 us, total = 263.271 ms, Queueing time: mean = 89.109 us, max = 1.104 ms, min = 2.197 us, total = 4.594 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 25791 total (1 active), Execution time: mean = 16.409 us, total = 423.205 ms, Queueing time: mean = 67.124 us, max = 41.182 ms, min = -0.000 s, total = 1.731 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 20601 total (1 active), Execution time: mean = 433.050 us, total = 8.921 s, Queueing time: mean = 65.366 us, max = 13.366 ms, min = 93.000 ns, total = 1.347 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5160 total (1 active), Execution time: mean = 8.436 us, total = 43.530 ms, Queueing time: mean = 170.771 us, max = 3.537 ms, min = -0.000 s, total = 881.176 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 5160 total (1 active), Execution time: mean = 14.894 us, total = 76.853 ms, Queueing time: mean = 61.776 us, max = 2.658 ms, min = 7.553 us, total = 318.765 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5160 total (1 active), Execution time: mean = 3.183 us, total = 16.423 ms, Queueing time: mean = 174.237 us, max = 3.551 ms, min = 2.496 us, total = 899.061 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5158 total (0 active), Execution time: mean = 557.558 us, total = 2.876 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5158 total (0 active), Execution time: mean = 98.766 us, total = 509.433 ms, Queueing time: mean = 93.997 us, max = 2.573 ms, min = 6.667 us, total = 484.835 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1721 total (1 active), Execution time: mean = 8.057 us, total = 13.867 ms, Queueing time: mean = 64.517 us, max = 496.804 us, min = 11.179 us, total = 111.034 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1032 total (1 active), Execution time: mean = 521.242 us, total = 537.922 ms, Queueing time: mean = 364.577 us, max = 2.197 ms, min = 6.917 us, total = 376.243 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1032 total (0 active), Execution time: mean = 1.391 ms, total = 1.435 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 1032 total (1 active), Execution time: mean = 297.007 us, total = 306.511 ms, Queueing time: mean = 588.392 us, max = 2.307 ms, min = 5.323 us, total = 607.220 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1032 total (0 active), Execution time: mean = 50.228 us, total = 51.835 ms, Queueing time: mean = 92.192 us, max = 237.873 us, min = 6.906 us, total = 95.143 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 516 total (1 active), Execution time: mean = 1.705 ms, total = 879.733 ms, Queueing time: mean = 64.642 us, max = 1.632 ms, min = 11.175 us, total = 33.356 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 86 total (1 active, 1 running), Execution time: mean = 2.600 ms, total = 223.625 ms, Queueing time: mean = 61.503 us, max = 172.215 us, min = 16.311 us, total = 5.289 ms [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 10 total (1 active), Execution time: mean = 479.876 s, total = 4798.758 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 9 total (0 active), Execution time: mean = 345.521 us, total = 3.110 ms, Queueing time: mean = 69.585 us, max = 184.802 us, min = 20.320 us, total = 626.263 us [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 6.487 us, total = 38.923 us, Queueing time: mean = 47.721 us, max = 79.050 us, min = 34.595 us, total = 286.323 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 0 [state-dump] [state-dump] [2025-01-21 06:39:16,222 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:39:16,444 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 455640 total (35 active) [state-dump] Queueing time: mean = 7.871 ms, max = 590.169 s, min = -0.000 s, total = 3586.555 s [state-dump] Execution time: mean = 10.689 ms, total = 4870.401 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 109578 total (0 active), Execution time: mean = 32.041 us, total = 3.511 s, Queueing time: mean = 94.786 us, max = 23.460 ms, min = 1.733 us, total = 10.386 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 109578 total (0 active), Execution time: mean = 461.341 us, total = 50.553 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 52156 total (1 active), Execution time: mean = 9.542 us, total = 497.670 ms, Queueing time: mean = 80.993 us, max = 55.955 ms, min = -0.000 s, total = 4.224 s [state-dump] NodeManager.CheckGC - 52156 total (1 active), Execution time: mean = 3.058 us, total = 159.511 ms, Queueing time: mean = 86.591 us, max = 55.958 ms, min = 3.126 us, total = 4.516 s [state-dump] ObjectManager.UpdateAvailableMemory - 52155 total (0 active), Execution time: mean = 5.113 us, total = 266.657 ms, Queueing time: mean = 89.188 us, max = 1.104 ms, min = 2.197 us, total = 4.652 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 26091 total (1 active), Execution time: mean = 16.433 us, total = 428.756 ms, Queueing time: mean = 67.161 us, max = 41.182 ms, min = -0.000 s, total = 1.752 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 20840 total (1 active), Execution time: mean = 433.074 us, total = 9.025 s, Queueing time: mean = 65.412 us, max = 13.366 ms, min = 93.000 ns, total = 1.363 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5220 total (1 active), Execution time: mean = 8.434 us, total = 44.028 ms, Queueing time: mean = 170.644 us, max = 3.537 ms, min = -0.000 s, total = 890.762 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 5220 total (1 active), Execution time: mean = 14.894 us, total = 77.745 ms, Queueing time: mean = 61.786 us, max = 2.658 ms, min = 7.553 us, total = 322.521 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5220 total (1 active), Execution time: mean = 3.180 us, total = 16.602 ms, Queueing time: mean = 174.111 us, max = 3.551 ms, min = 2.496 us, total = 908.861 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5218 total (0 active), Execution time: mean = 557.964 us, total = 2.911 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5218 total (0 active), Execution time: mean = 98.763 us, total = 515.344 ms, Queueing time: mean = 94.079 us, max = 2.573 ms, min = 6.667 us, total = 490.902 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1741 total (1 active), Execution time: mean = 8.066 us, total = 14.043 ms, Queueing time: mean = 64.625 us, max = 496.804 us, min = 11.179 us, total = 112.513 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1044 total (1 active), Execution time: mean = 520.714 us, total = 543.625 ms, Queueing time: mean = 364.417 us, max = 2.197 ms, min = 6.917 us, total = 380.452 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1044 total (0 active), Execution time: mean = 1.390 ms, total = 1.451 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 1044 total (1 active), Execution time: mean = 296.702 us, total = 309.757 ms, Queueing time: mean = 587.923 us, max = 2.307 ms, min = 5.323 us, total = 613.792 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1044 total (0 active), Execution time: mean = 50.197 us, total = 52.406 ms, Queueing time: mean = 92.108 us, max = 237.873 us, min = 6.906 us, total = 96.161 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 522 total (1 active), Execution time: mean = 1.704 ms, total = 889.447 ms, Queueing time: mean = 64.745 us, max = 1.632 ms, min = 11.175 us, total = 33.797 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 87 total (1 active, 1 running), Execution time: mean = 2.582 ms, total = 224.594 ms, Queueing time: mean = 60.954 us, max = 172.215 us, min = 13.784 us, total = 5.303 ms [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 10 total (1 active), Execution time: mean = 479.876 s, total = 4798.758 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 9 total (0 active), Execution time: mean = 345.521 us, total = 3.110 ms, Queueing time: mean = 69.585 us, max = 184.802 us, min = 20.320 us, total = 626.263 us [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 6.487 us, total = 38.923 us, Queueing time: mean = 47.721 us, max = 79.050 us, min = 34.595 us, total = 286.323 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:40:16,223 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:40:16,447 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 460875 total (35 active) [state-dump] Queueing time: mean = 7.783 ms, max = 590.169 s, min = -0.000 s, total = 3586.882 s [state-dump] Execution time: mean = 10.569 ms, total = 4871.169 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 110838 total (0 active), Execution time: mean = 31.992 us, total = 3.546 s, Queueing time: mean = 94.769 us, max = 23.460 ms, min = 1.733 us, total = 10.504 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 110838 total (0 active), Execution time: mean = 460.950 us, total = 51.091 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 52756 total (1 active), Execution time: mean = 9.529 us, total = 502.710 ms, Queueing time: mean = 80.891 us, max = 55.955 ms, min = -0.000 s, total = 4.268 s [state-dump] NodeManager.CheckGC - 52756 total (1 active), Execution time: mean = 3.055 us, total = 161.169 ms, Queueing time: mean = 86.480 us, max = 55.958 ms, min = 3.126 us, total = 4.562 s [state-dump] ObjectManager.UpdateAvailableMemory - 52755 total (0 active), Execution time: mean = 5.103 us, total = 269.218 ms, Queueing time: mean = 88.897 us, max = 1.104 ms, min = 2.040 us, total = 4.690 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 26391 total (1 active), Execution time: mean = 16.427 us, total = 433.518 ms, Queueing time: mean = 67.102 us, max = 41.182 ms, min = -0.000 s, total = 1.771 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 21080 total (1 active), Execution time: mean = 432.875 us, total = 9.125 s, Queueing time: mean = 65.539 us, max = 13.366 ms, min = 93.000 ns, total = 1.382 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5280 total (1 active), Execution time: mean = 8.427 us, total = 44.492 ms, Queueing time: mean = 170.740 us, max = 3.537 ms, min = -0.000 s, total = 901.506 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 5280 total (1 active), Execution time: mean = 14.869 us, total = 78.506 ms, Queueing time: mean = 61.829 us, max = 2.658 ms, min = 7.553 us, total = 326.459 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5280 total (1 active), Execution time: mean = 3.179 us, total = 16.785 ms, Queueing time: mean = 174.203 us, max = 3.551 ms, min = 2.496 us, total = 919.794 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5278 total (0 active), Execution time: mean = 557.941 us, total = 2.945 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5278 total (0 active), Execution time: mean = 98.721 us, total = 521.051 ms, Queueing time: mean = 94.051 us, max = 2.573 ms, min = 6.667 us, total = 496.401 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1761 total (1 active), Execution time: mean = 8.063 us, total = 14.198 ms, Queueing time: mean = 64.461 us, max = 496.804 us, min = 11.179 us, total = 113.517 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1056 total (1 active), Execution time: mean = 521.138 us, total = 550.321 ms, Queueing time: mean = 364.208 us, max = 2.197 ms, min = 6.917 us, total = 384.604 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1056 total (0 active), Execution time: mean = 1.390 ms, total = 1.468 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 1056 total (1 active), Execution time: mean = 296.792 us, total = 313.412 ms, Queueing time: mean = 588.066 us, max = 2.307 ms, min = 5.323 us, total = 620.997 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1056 total (0 active), Execution time: mean = 50.224 us, total = 53.036 ms, Queueing time: mean = 91.965 us, max = 237.873 us, min = 6.906 us, total = 97.115 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 528 total (1 active), Execution time: mean = 1.704 ms, total = 899.729 ms, Queueing time: mean = 64.671 us, max = 1.632 ms, min = 11.175 us, total = 34.146 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 88 total (1 active, 1 running), Execution time: mean = 2.584 ms, total = 227.419 ms, Queueing time: mean = 61.047 us, max = 172.215 us, min = 13.784 us, total = 5.372 ms [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 10 total (1 active), Execution time: mean = 479.876 s, total = 4798.758 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 9 total (0 active), Execution time: mean = 345.521 us, total = 3.110 ms, Queueing time: mean = 69.585 us, max = 184.802 us, min = 20.320 us, total = 626.263 us [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 6.487 us, total = 38.923 us, Queueing time: mean = 47.721 us, max = 79.050 us, min = 34.595 us, total = 286.323 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:41:16,223 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:41:16,450 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 466106 total (35 active) [state-dump] Queueing time: mean = 7.696 ms, max = 590.169 s, min = -0.000 s, total = 3587.216 s [state-dump] Execution time: mean = 10.452 ms, total = 4871.936 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 112098 total (0 active), Execution time: mean = 31.962 us, total = 3.583 s, Queueing time: mean = 94.704 us, max = 23.460 ms, min = 1.733 us, total = 10.616 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 112098 total (0 active), Execution time: mean = 460.527 us, total = 51.624 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 53355 total (1 active), Execution time: mean = 9.537 us, total = 508.863 ms, Queueing time: mean = 80.900 us, max = 55.955 ms, min = -0.000 s, total = 4.316 s [state-dump] NodeManager.CheckGC - 53355 total (1 active), Execution time: mean = 3.057 us, total = 163.084 ms, Queueing time: mean = 86.496 us, max = 55.958 ms, min = 3.126 us, total = 4.615 s [state-dump] ObjectManager.UpdateAvailableMemory - 53354 total (0 active), Execution time: mean = 5.102 us, total = 272.221 ms, Queueing time: mean = 88.827 us, max = 1.104 ms, min = 2.040 us, total = 4.739 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 26691 total (1 active), Execution time: mean = 16.422 us, total = 438.329 ms, Queueing time: mean = 67.015 us, max = 41.182 ms, min = -0.000 s, total = 1.789 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 21319 total (1 active), Execution time: mean = 432.913 us, total = 9.229 s, Queueing time: mean = 65.484 us, max = 13.366 ms, min = 93.000 ns, total = 1.396 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5340 total (1 active), Execution time: mean = 8.417 us, total = 44.948 ms, Queueing time: mean = 170.420 us, max = 3.537 ms, min = -0.000 s, total = 910.043 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 5340 total (1 active), Execution time: mean = 14.875 us, total = 79.432 ms, Queueing time: mean = 62.065 us, max = 2.658 ms, min = 7.553 us, total = 331.427 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5340 total (1 active), Execution time: mean = 3.177 us, total = 16.966 ms, Queueing time: mean = 173.880 us, max = 3.551 ms, min = 2.496 us, total = 928.518 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5338 total (0 active), Execution time: mean = 557.509 us, total = 2.976 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5338 total (0 active), Execution time: mean = 98.654 us, total = 526.613 ms, Queueing time: mean = 94.037 us, max = 2.573 ms, min = 6.667 us, total = 501.969 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1781 total (1 active), Execution time: mean = 8.071 us, total = 14.374 ms, Queueing time: mean = 64.447 us, max = 496.804 us, min = 11.179 us, total = 114.781 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1068 total (1 active), Execution time: mean = 521.301 us, total = 556.749 ms, Queueing time: mean = 362.848 us, max = 2.197 ms, min = 6.917 us, total = 387.522 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1068 total (0 active), Execution time: mean = 1.389 ms, total = 1.483 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 1068 total (1 active), Execution time: mean = 297.129 us, total = 317.334 ms, Queueing time: mean = 586.564 us, max = 2.307 ms, min = 5.323 us, total = 626.450 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1068 total (0 active), Execution time: mean = 50.238 us, total = 53.654 ms, Queueing time: mean = 91.771 us, max = 237.873 us, min = 6.906 us, total = 98.011 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 534 total (1 active), Execution time: mean = 1.702 ms, total = 908.952 ms, Queueing time: mean = 64.640 us, max = 1.632 ms, min = 11.175 us, total = 34.518 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 89 total (1 active, 1 running), Execution time: mean = 2.586 ms, total = 230.170 ms, Queueing time: mean = 61.331 us, max = 172.215 us, min = 13.784 us, total = 5.458 ms [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 10 total (1 active), Execution time: mean = 479.876 s, total = 4798.758 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 9 total (0 active), Execution time: mean = 345.521 us, total = 3.110 ms, Queueing time: mean = 69.585 us, max = 184.802 us, min = 20.320 us, total = 626.263 us [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 6.487 us, total = 38.923 us, Queueing time: mean = 47.721 us, max = 79.050 us, min = 34.595 us, total = 286.323 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:42:16,223 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:42:16,286 I 16746 16746] (raylet) node_manager.cc:658: Sending Python GC request to 21 local workers to clean up Python cyclic references. [2025-01-21 06:42:16,453 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 471385 total (36 active) [state-dump] Queueing time: mean = 7.611 ms, max = 590.169 s, min = -0.000 s, total = 3587.486 s [state-dump] Execution time: mean = 11.611 ms, total = 5473.321 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 113358 total (0 active), Execution time: mean = 458.101 us, total = 51.929 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 113358 total (0 active), Execution time: mean = 31.817 us, total = 3.607 s, Queueing time: mean = 94.024 us, max = 23.460 ms, min = 1.733 us, total = 10.658 s [state-dump] RaySyncer.OnDemandBroadcasting - 53955 total (1 active), Execution time: mean = 9.510 us, total = 513.109 ms, Queueing time: mean = 80.544 us, max = 55.955 ms, min = -0.000 s, total = 4.346 s [state-dump] NodeManager.CheckGC - 53955 total (1 active), Execution time: mean = 3.957 us, total = 213.477 ms, Queueing time: mean = 86.119 us, max = 55.958 ms, min = 3.126 us, total = 4.647 s [state-dump] ObjectManager.UpdateAvailableMemory - 53954 total (0 active), Execution time: mean = 5.081 us, total = 274.134 ms, Queueing time: mean = 89.032 us, max = 48.698 ms, min = 2.040 us, total = 4.804 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 26991 total (1 active), Execution time: mean = 16.377 us, total = 442.034 ms, Queueing time: mean = 66.680 us, max = 41.182 ms, min = -0.000 s, total = 1.800 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 21559 total (1 active), Execution time: mean = 432.611 us, total = 9.327 s, Queueing time: mean = 65.220 us, max = 13.366 ms, min = 93.000 ns, total = 1.406 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5400 total (1 active), Execution time: mean = 8.407 us, total = 45.397 ms, Queueing time: mean = 170.514 us, max = 3.537 ms, min = -0.000 s, total = 920.775 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 5400 total (1 active), Execution time: mean = 14.835 us, total = 80.109 ms, Queueing time: mean = 61.721 us, max = 2.658 ms, min = 7.553 us, total = 333.292 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5400 total (1 active), Execution time: mean = 3.180 us, total = 17.171 ms, Queueing time: mean = 173.962 us, max = 3.551 ms, min = 2.496 us, total = 939.393 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5398 total (0 active), Execution time: mean = 554.780 us, total = 2.995 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5398 total (0 active), Execution time: mean = 98.426 us, total = 531.303 ms, Queueing time: mean = 93.260 us, max = 2.573 ms, min = 6.667 us, total = 503.418 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1801 total (1 active), Execution time: mean = 8.053 us, total = 14.504 ms, Queueing time: mean = 65.452 us, max = 2.386 ms, min = 11.179 us, total = 117.880 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1080 total (0 active), Execution time: mean = 50.131 us, total = 54.141 ms, Queueing time: mean = 91.188 us, max = 237.873 us, min = 6.906 us, total = 98.483 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1080 total (1 active), Execution time: mean = 521.264 us, total = 562.965 ms, Queueing time: mean = 363.289 us, max = 2.197 ms, min = 6.917 us, total = 392.353 ms [state-dump] NodeManager.GcsCheckAlive - 1080 total (1 active), Execution time: mean = 297.007 us, total = 320.768 ms, Queueing time: mean = 587.079 us, max = 2.307 ms, min = 5.323 us, total = 634.046 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1080 total (0 active), Execution time: mean = 1.385 ms, total = 1.496 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 540 total (1 active), Execution time: mean = 1.703 ms, total = 919.678 ms, Queueing time: mean = 64.536 us, max = 1.632 ms, min = 11.175 us, total = 34.849 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 90 total (1 active, 1 running), Execution time: mean = 2.590 ms, total = 233.115 ms, Queueing time: mean = 61.405 us, max = 172.215 us, min = 13.784 us, total = 5.526 ms [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (1 active), Execution time: mean = 39.788 ms, total = 835.553 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 20 total (0 active), Execution time: mean = 22.413 us, total = 448.251 us, Queueing time: mean = 1.962 ms, max = 18.580 ms, min = 8.816 us, total = 39.247 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 11 total (1 active), Execution time: mean = 490.796 s, total = 5398.758 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 10 total (0 active), Execution time: mean = 341.406 us, total = 3.414 ms, Queueing time: mean = 66.172 us, max = 184.802 us, min = 20.320 us, total = 661.725 us [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 6.280 us, total = 43.960 us, Queueing time: mean = 44.707 us, max = 79.050 us, min = 26.627 us, total = 312.950 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:43:16,223 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:43:16,456 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 476619 total (35 active) [state-dump] Queueing time: mean = 7.527 ms, max = 590.169 s, min = -0.000 s, total = 3587.641 s [state-dump] Execution time: mean = 11.485 ms, total = 5473.949 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 114618 total (0 active), Execution time: mean = 455.654 us, total = 52.226 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 114618 total (0 active), Execution time: mean = 31.664 us, total = 3.629 s, Queueing time: mean = 93.302 us, max = 23.460 ms, min = 1.733 us, total = 10.694 s [state-dump] RaySyncer.OnDemandBroadcasting - 54555 total (1 active), Execution time: mean = 9.477 us, total = 517.020 ms, Queueing time: mean = 80.146 us, max = 55.955 ms, min = -0.000 s, total = 4.372 s [state-dump] ObjectManager.UpdateAvailableMemory - 54554 total (0 active), Execution time: mean = 5.057 us, total = 275.906 ms, Queueing time: mean = 88.325 us, max = 48.698 ms, min = 2.040 us, total = 4.818 s [state-dump] NodeManager.CheckGC - 54554 total (1 active), Execution time: mean = 3.978 us, total = 217.007 ms, Queueing time: mean = 85.583 us, max = 55.958 ms, min = 3.126 us, total = 4.669 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 27291 total (1 active), Execution time: mean = 16.318 us, total = 445.323 ms, Queueing time: mean = 66.324 us, max = 41.182 ms, min = -0.000 s, total = 1.810 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 21798 total (1 active), Execution time: mean = 432.295 us, total = 9.423 s, Queueing time: mean = 64.855 us, max = 13.366 ms, min = 93.000 ns, total = 1.414 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5460 total (1 active), Execution time: mean = 8.380 us, total = 45.757 ms, Queueing time: mean = 170.605 us, max = 3.537 ms, min = -0.000 s, total = 931.505 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 5460 total (1 active), Execution time: mean = 14.783 us, total = 80.715 ms, Queueing time: mean = 61.367 us, max = 2.658 ms, min = 7.553 us, total = 335.062 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5460 total (1 active), Execution time: mean = 3.177 us, total = 17.344 ms, Queueing time: mean = 174.040 us, max = 3.551 ms, min = 2.496 us, total = 950.258 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5458 total (0 active), Execution time: mean = 98.244 us, total = 536.218 ms, Queueing time: mean = 92.487 us, max = 2.573 ms, min = 6.667 us, total = 504.796 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5458 total (0 active), Execution time: mean = 552.133 us, total = 3.014 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1821 total (1 active), Execution time: mean = 8.035 us, total = 14.631 ms, Queueing time: mean = 65.200 us, max = 2.386 ms, min = 11.179 us, total = 118.730 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1092 total (0 active), Execution time: mean = 1.381 ms, total = 1.508 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1092 total (0 active), Execution time: mean = 50.035 us, total = 54.638 ms, Queueing time: mean = 90.462 us, max = 237.873 us, min = 6.906 us, total = 98.785 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1092 total (1 active), Execution time: mean = 521.596 us, total = 569.583 ms, Queueing time: mean = 363.337 us, max = 2.197 ms, min = 6.917 us, total = 396.764 ms [state-dump] NodeManager.GcsCheckAlive - 1092 total (1 active), Execution time: mean = 296.695 us, total = 323.991 ms, Queueing time: mean = 587.768 us, max = 2.307 ms, min = 5.323 us, total = 641.842 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 546 total (1 active), Execution time: mean = 1.704 ms, total = 930.208 ms, Queueing time: mean = 64.130 us, max = 1.632 ms, min = 11.175 us, total = 35.015 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 91 total (1 active, 1 running), Execution time: mean = 2.594 ms, total = 236.078 ms, Queueing time: mean = 61.017 us, max = 172.215 us, min = 13.784 us, total = 5.553 ms [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 11 total (1 active), Execution time: mean = 490.796 s, total = 5398.758 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 10 total (0 active), Execution time: mean = 341.406 us, total = 3.414 ms, Queueing time: mean = 66.172 us, max = 184.802 us, min = 20.320 us, total = 661.725 us [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 6.280 us, total = 43.960 us, Queueing time: mean = 44.707 us, max = 79.050 us, min = 26.627 us, total = 312.950 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:44:16,223 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:44:16,459 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 481854 total (35 active) [state-dump] Queueing time: mean = 7.446 ms, max = 590.169 s, min = -0.000 s, total = 3587.869 s [state-dump] Execution time: mean = 11.361 ms, total = 5474.557 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 115878 total (0 active), Execution time: mean = 454.143 us, total = 52.625 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 115878 total (0 active), Execution time: mean = 31.567 us, total = 3.658 s, Queueing time: mean = 92.886 us, max = 23.460 ms, min = 1.733 us, total = 10.763 s [state-dump] RaySyncer.OnDemandBroadcasting - 55155 total (1 active), Execution time: mean = 9.457 us, total = 521.623 ms, Queueing time: mean = 79.892 us, max = 55.955 ms, min = -0.000 s, total = 4.406 s [state-dump] ObjectManager.UpdateAvailableMemory - 55154 total (0 active), Execution time: mean = 5.044 us, total = 278.187 ms, Queueing time: mean = 87.986 us, max = 48.698 ms, min = 2.040 us, total = 4.853 s [state-dump] NodeManager.CheckGC - 55154 total (1 active), Execution time: mean = 4.006 us, total = 220.938 ms, Queueing time: mean = 85.174 us, max = 55.958 ms, min = 3.126 us, total = 4.698 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 27591 total (1 active), Execution time: mean = 16.272 us, total = 448.973 ms, Queueing time: mean = 66.085 us, max = 41.182 ms, min = -0.000 s, total = 1.823 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 22038 total (1 active), Execution time: mean = 432.104 us, total = 9.523 s, Queueing time: mean = 64.668 us, max = 13.366 ms, min = 93.000 ns, total = 1.425 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5520 total (1 active), Execution time: mean = 8.355 us, total = 46.120 ms, Queueing time: mean = 170.470 us, max = 3.537 ms, min = -0.000 s, total = 940.996 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 5520 total (1 active), Execution time: mean = 14.752 us, total = 81.431 ms, Queueing time: mean = 61.130 us, max = 2.658 ms, min = 7.553 us, total = 337.437 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5520 total (1 active), Execution time: mean = 3.173 us, total = 17.513 ms, Queueing time: mean = 173.892 us, max = 3.551 ms, min = 2.496 us, total = 959.881 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5518 total (0 active), Execution time: mean = 98.089 us, total = 541.255 ms, Queueing time: mean = 92.073 us, max = 2.573 ms, min = 6.667 us, total = 508.060 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5518 total (0 active), Execution time: mean = 550.560 us, total = 3.038 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1841 total (1 active), Execution time: mean = 8.014 us, total = 14.755 ms, Queueing time: mean = 64.929 us, max = 2.386 ms, min = 11.179 us, total = 119.535 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1104 total (0 active), Execution time: mean = 1.378 ms, total = 1.521 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1104 total (0 active), Execution time: mean = 49.934 us, total = 55.127 ms, Queueing time: mean = 90.139 us, max = 237.873 us, min = 6.906 us, total = 99.514 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1104 total (1 active), Execution time: mean = 521.441 us, total = 575.671 ms, Queueing time: mean = 362.555 us, max = 2.197 ms, min = 6.917 us, total = 400.260 ms [state-dump] NodeManager.GcsCheckAlive - 1104 total (1 active), Execution time: mean = 296.376 us, total = 327.200 ms, Queueing time: mean = 587.219 us, max = 2.307 ms, min = 5.323 us, total = 648.290 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 552 total (1 active), Execution time: mean = 1.702 ms, total = 939.490 ms, Queueing time: mean = 63.872 us, max = 1.632 ms, min = 11.175 us, total = 35.257 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 92 total (1 active, 1 running), Execution time: mean = 2.598 ms, total = 239.029 ms, Queueing time: mean = 60.747 us, max = 172.215 us, min = 13.784 us, total = 5.589 ms [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 11 total (1 active), Execution time: mean = 490.796 s, total = 5398.758 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 10 total (0 active), Execution time: mean = 341.406 us, total = 3.414 ms, Queueing time: mean = 66.172 us, max = 184.802 us, min = 20.320 us, total = 661.725 us [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 6.280 us, total = 43.960 us, Queueing time: mean = 44.707 us, max = 79.050 us, min = 26.627 us, total = 312.950 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:45:16,224 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:45:16,462 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 487087 total (35 active) [state-dump] Queueing time: mean = 7.367 ms, max = 590.169 s, min = -0.000 s, total = 3588.199 s [state-dump] Execution time: mean = 11.241 ms, total = 5475.373 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 117138 total (0 active), Execution time: mean = 454.150 us, total = 53.198 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 117138 total (0 active), Execution time: mean = 31.541 us, total = 3.695 s, Queueing time: mean = 92.877 us, max = 23.460 ms, min = 1.508 us, total = 10.879 s [state-dump] RaySyncer.OnDemandBroadcasting - 55754 total (1 active), Execution time: mean = 9.456 us, total = 527.193 ms, Queueing time: mean = 79.890 us, max = 55.955 ms, min = -0.000 s, total = 4.454 s [state-dump] NodeManager.CheckGC - 55754 total (1 active), Execution time: mean = 4.045 us, total = 225.508 ms, Queueing time: mean = 84.909 us, max = 55.958 ms, min = 3.126 us, total = 4.734 s [state-dump] ObjectManager.UpdateAvailableMemory - 55753 total (0 active), Execution time: mean = 5.045 us, total = 281.276 ms, Queueing time: mean = 87.951 us, max = 48.698 ms, min = 2.040 us, total = 4.904 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 27891 total (1 active), Execution time: mean = 16.263 us, total = 453.586 ms, Queueing time: mean = 66.009 us, max = 41.182 ms, min = -0.000 s, total = 1.841 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 22278 total (1 active), Execution time: mean = 432.167 us, total = 9.628 s, Queueing time: mean = 64.675 us, max = 13.366 ms, min = 93.000 ns, total = 1.441 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5580 total (1 active), Execution time: mean = 8.349 us, total = 46.589 ms, Queueing time: mean = 170.519 us, max = 3.537 ms, min = -0.000 s, total = 951.498 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 5580 total (1 active), Execution time: mean = 14.741 us, total = 82.253 ms, Queueing time: mean = 61.043 us, max = 2.658 ms, min = 7.553 us, total = 340.622 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5580 total (1 active), Execution time: mean = 3.172 us, total = 17.701 ms, Queueing time: mean = 173.938 us, max = 3.551 ms, min = 2.496 us, total = 970.576 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5578 total (0 active), Execution time: mean = 98.097 us, total = 547.183 ms, Queueing time: mean = 92.187 us, max = 2.573 ms, min = 6.667 us, total = 514.221 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5578 total (0 active), Execution time: mean = 550.939 us, total = 3.073 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1861 total (1 active), Execution time: mean = 8.021 us, total = 14.928 ms, Queueing time: mean = 64.936 us, max = 2.386 ms, min = 11.179 us, total = 120.846 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1116 total (0 active), Execution time: mean = 1.378 ms, total = 1.538 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1116 total (0 active), Execution time: mean = 49.944 us, total = 55.738 ms, Queueing time: mean = 90.376 us, max = 237.873 us, min = 6.906 us, total = 100.859 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1116 total (1 active), Execution time: mean = 521.607 us, total = 582.113 ms, Queueing time: mean = 362.530 us, max = 2.197 ms, min = 6.917 us, total = 404.584 ms [state-dump] NodeManager.GcsCheckAlive - 1116 total (1 active), Execution time: mean = 296.328 us, total = 330.702 ms, Queueing time: mean = 587.447 us, max = 2.307 ms, min = 5.323 us, total = 655.591 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 558 total (1 active), Execution time: mean = 1.703 ms, total = 950.111 ms, Queueing time: mean = 63.741 us, max = 1.632 ms, min = 11.175 us, total = 35.567 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 93 total (1 active, 1 running), Execution time: mean = 2.603 ms, total = 242.075 ms, Queueing time: mean = 60.273 us, max = 172.215 us, min = 13.784 us, total = 5.605 ms [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 11 total (1 active), Execution time: mean = 490.796 s, total = 5398.758 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 10 total (0 active), Execution time: mean = 341.406 us, total = 3.414 ms, Queueing time: mean = 66.172 us, max = 184.802 us, min = 20.320 us, total = 661.725 us [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 6.280 us, total = 43.960 us, Queueing time: mean = 44.707 us, max = 79.050 us, min = 26.627 us, total = 312.950 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:46:16,224 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:46:16,465 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 492320 total (35 active) [state-dump] Queueing time: mean = 7.289 ms, max = 590.169 s, min = -0.000 s, total = 3588.554 s [state-dump] Execution time: mean = 11.123 ms, total = 5476.215 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 118398 total (0 active), Execution time: mean = 454.341 us, total = 53.793 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 118398 total (0 active), Execution time: mean = 31.533 us, total = 3.733 s, Queueing time: mean = 92.966 us, max = 23.460 ms, min = 1.508 us, total = 11.007 s [state-dump] RaySyncer.OnDemandBroadcasting - 56354 total (1 active), Execution time: mean = 9.454 us, total = 532.776 ms, Queueing time: mean = 79.886 us, max = 55.955 ms, min = -0.000 s, total = 4.502 s [state-dump] ObjectManager.UpdateAvailableMemory - 56353 total (0 active), Execution time: mean = 5.049 us, total = 284.542 ms, Queueing time: mean = 88.017 us, max = 48.698 ms, min = 2.040 us, total = 4.960 s [state-dump] NodeManager.CheckGC - 56353 total (1 active), Execution time: mean = 4.080 us, total = 229.936 ms, Queueing time: mean = 84.673 us, max = 55.958 ms, min = 3.126 us, total = 4.772 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 28191 total (1 active), Execution time: mean = 16.266 us, total = 458.556 ms, Queueing time: mean = 65.958 us, max = 41.182 ms, min = -0.000 s, total = 1.859 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 22517 total (1 active), Execution time: mean = 432.170 us, total = 9.731 s, Queueing time: mean = 64.646 us, max = 13.366 ms, min = 93.000 ns, total = 1.456 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5640 total (1 active), Execution time: mean = 8.359 us, total = 47.143 ms, Queueing time: mean = 170.774 us, max = 3.537 ms, min = -0.000 s, total = 963.164 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 5640 total (1 active), Execution time: mean = 14.757 us, total = 83.231 ms, Queueing time: mean = 61.058 us, max = 2.658 ms, min = 7.553 us, total = 344.368 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5640 total (1 active), Execution time: mean = 3.172 us, total = 17.888 ms, Queueing time: mean = 174.197 us, max = 3.551 ms, min = 2.496 us, total = 982.473 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5638 total (0 active), Execution time: mean = 98.190 us, total = 553.595 ms, Queueing time: mean = 92.232 us, max = 2.573 ms, min = 6.667 us, total = 520.002 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5638 total (0 active), Execution time: mean = 551.248 us, total = 3.108 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1881 total (1 active), Execution time: mean = 8.023 us, total = 15.091 ms, Queueing time: mean = 64.882 us, max = 2.386 ms, min = 11.179 us, total = 122.043 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1128 total (0 active), Execution time: mean = 1.379 ms, total = 1.556 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1128 total (0 active), Execution time: mean = 49.975 us, total = 56.372 ms, Queueing time: mean = 93.852 us, max = 3.960 ms, min = 6.906 us, total = 105.865 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1128 total (1 active), Execution time: mean = 522.154 us, total = 588.990 ms, Queueing time: mean = 363.325 us, max = 2.197 ms, min = 6.917 us, total = 409.830 ms [state-dump] NodeManager.GcsCheckAlive - 1128 total (1 active), Execution time: mean = 296.589 us, total = 334.552 ms, Queueing time: mean = 588.517 us, max = 2.307 ms, min = 5.323 us, total = 663.847 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 564 total (1 active), Execution time: mean = 1.705 ms, total = 961.719 ms, Queueing time: mean = 63.731 us, max = 1.632 ms, min = 11.175 us, total = 35.944 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 94 total (1 active, 1 running), Execution time: mean = 2.604 ms, total = 244.809 ms, Queueing time: mean = 60.162 us, max = 172.215 us, min = 13.784 us, total = 5.655 ms [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 11 total (1 active), Execution time: mean = 490.796 s, total = 5398.758 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 10 total (0 active), Execution time: mean = 341.406 us, total = 3.414 ms, Queueing time: mean = 66.172 us, max = 184.802 us, min = 20.320 us, total = 661.725 us [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 6.280 us, total = 43.960 us, Queueing time: mean = 44.707 us, max = 79.050 us, min = 26.627 us, total = 312.950 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 06:47:16,224 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:47:16,466 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 497549 total (35 active) [state-dump] Queueing time: mean = 7.213 ms, max = 590.169 s, min = -0.000 s, total = 3589.030 s [state-dump] Execution time: mean = 11.008 ms, total = 5477.047 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 119658 total (0 active), Execution time: mean = 454.514 us, total = 54.386 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 119658 total (0 active), Execution time: mean = 31.517 us, total = 3.771 s, Queueing time: mean = 93.015 us, max = 23.460 ms, min = 1.508 us, total = 11.130 s [state-dump] RaySyncer.OnDemandBroadcasting - 56952 total (1 active), Execution time: mean = 9.449 us, total = 538.134 ms, Queueing time: mean = 81.055 us, max = 65.085 ms, min = -0.000 s, total = 4.616 s [state-dump] NodeManager.CheckGC - 56952 total (1 active), Execution time: mean = 4.087 us, total = 232.755 ms, Queueing time: mean = 85.623 us, max = 60.039 ms, min = 3.126 us, total = 4.876 s [state-dump] ObjectManager.UpdateAvailableMemory - 56951 total (0 active), Execution time: mean = 5.047 us, total = 287.457 ms, Queueing time: mean = 87.986 us, max = 48.698 ms, min = 2.040 us, total = 5.011 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 28490 total (1 active), Execution time: mean = 16.268 us, total = 463.486 ms, Queueing time: mean = 66.067 us, max = 41.182 ms, min = -0.000 s, total = 1.882 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 22757 total (1 active), Execution time: mean = 432.112 us, total = 9.834 s, Queueing time: mean = 64.635 us, max = 13.366 ms, min = 93.000 ns, total = 1.471 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5700 total (1 active), Execution time: mean = 8.355 us, total = 47.622 ms, Queueing time: mean = 170.744 us, max = 3.537 ms, min = -0.000 s, total = 973.243 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 5700 total (1 active), Execution time: mean = 14.768 us, total = 84.176 ms, Queueing time: mean = 61.085 us, max = 2.658 ms, min = 7.553 us, total = 348.186 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5700 total (1 active), Execution time: mean = 3.172 us, total = 18.082 ms, Queueing time: mean = 174.163 us, max = 3.551 ms, min = 2.496 us, total = 992.728 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5698 total (0 active), Execution time: mean = 98.192 us, total = 559.496 ms, Queueing time: mean = 92.297 us, max = 2.573 ms, min = 6.667 us, total = 525.909 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5698 total (0 active), Execution time: mean = 551.347 us, total = 3.142 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1901 total (1 active), Execution time: mean = 8.028 us, total = 15.262 ms, Queueing time: mean = 64.786 us, max = 2.386 ms, min = 11.179 us, total = 123.159 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1140 total (0 active), Execution time: mean = 1.380 ms, total = 1.573 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1140 total (0 active), Execution time: mean = 49.981 us, total = 56.978 ms, Queueing time: mean = 93.876 us, max = 3.960 ms, min = 6.906 us, total = 107.019 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1140 total (1 active), Execution time: mean = 522.071 us, total = 595.161 ms, Queueing time: mean = 363.344 us, max = 2.197 ms, min = 6.917 us, total = 414.212 ms [state-dump] NodeManager.GcsCheckAlive - 1140 total (1 active), Execution time: mean = 296.456 us, total = 337.960 ms, Queueing time: mean = 588.405 us, max = 2.307 ms, min = 5.323 us, total = 670.782 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 570 total (1 active), Execution time: mean = 1.706 ms, total = 972.307 ms, Queueing time: mean = 63.854 us, max = 1.632 ms, min = 11.175 us, total = 36.397 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 95 total (1 active, 1 running), Execution time: mean = 2.608 ms, total = 247.775 ms, Queueing time: mean = 60.035 us, max = 172.215 us, min = 13.784 us, total = 5.703 ms [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 11 total (1 active), Execution time: mean = 490.796 s, total = 5398.758 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 10 total (0 active), Execution time: mean = 341.406 us, total = 3.414 ms, Queueing time: mean = 66.172 us, max = 184.802 us, min = 20.320 us, total = 661.725 us [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 6.280 us, total = 43.960 us, Queueing time: mean = 44.707 us, max = 79.050 us, min = 26.627 us, total = 312.950 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:48:16,224 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:48:16,469 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 502783 total (35 active) [state-dump] Queueing time: mean = 7.139 ms, max = 590.169 s, min = -0.000 s, total = 3589.287 s [state-dump] Execution time: mean = 10.895 ms, total = 5477.689 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 120918 total (0 active), Execution time: mean = 453.330 us, total = 54.816 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 120918 total (0 active), Execution time: mean = 31.433 us, total = 3.801 s, Queueing time: mean = 92.673 us, max = 23.460 ms, min = 1.508 us, total = 11.206 s [state-dump] RaySyncer.OnDemandBroadcasting - 57552 total (1 active), Execution time: mean = 9.438 us, total = 543.172 ms, Queueing time: mean = 80.858 us, max = 65.085 ms, min = -0.000 s, total = 4.654 s [state-dump] NodeManager.CheckGC - 57552 total (1 active), Execution time: mean = 4.075 us, total = 234.541 ms, Queueing time: mean = 85.427 us, max = 60.039 ms, min = 3.126 us, total = 4.916 s [state-dump] ObjectManager.UpdateAvailableMemory - 57551 total (0 active), Execution time: mean = 5.036 us, total = 289.801 ms, Queueing time: mean = 87.510 us, max = 48.698 ms, min = 2.040 us, total = 5.036 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 28790 total (1 active), Execution time: mean = 16.252 us, total = 467.907 ms, Queueing time: mean = 65.932 us, max = 41.182 ms, min = -0.000 s, total = 1.898 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 22996 total (1 active), Execution time: mean = 432.058 us, total = 9.936 s, Queueing time: mean = 64.568 us, max = 13.366 ms, min = 93.000 ns, total = 1.485 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5760 total (1 active), Execution time: mean = 8.342 us, total = 48.051 ms, Queueing time: mean = 170.902 us, max = 3.537 ms, min = -0.000 s, total = 984.394 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 5760 total (1 active), Execution time: mean = 14.760 us, total = 85.018 ms, Queueing time: mean = 60.990 us, max = 2.658 ms, min = 7.553 us, total = 351.301 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5760 total (1 active), Execution time: mean = 3.171 us, total = 18.264 ms, Queueing time: mean = 174.313 us, max = 3.551 ms, min = 2.496 us, total = 1.004 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5758 total (0 active), Execution time: mean = 98.079 us, total = 564.740 ms, Queueing time: mean = 91.872 us, max = 2.573 ms, min = 6.667 us, total = 528.999 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5758 total (0 active), Execution time: mean = 550.018 us, total = 3.167 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1921 total (1 active), Execution time: mean = 8.022 us, total = 15.410 ms, Queueing time: mean = 67.663 us, max = 5.703 ms, min = 11.179 us, total = 129.981 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1152 total (0 active), Execution time: mean = 1.377 ms, total = 1.586 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1152 total (0 active), Execution time: mean = 49.909 us, total = 57.495 ms, Queueing time: mean = 93.387 us, max = 3.960 ms, min = 6.906 us, total = 107.581 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1152 total (1 active), Execution time: mean = 522.337 us, total = 601.732 ms, Queueing time: mean = 363.773 us, max = 2.197 ms, min = 6.917 us, total = 419.066 ms [state-dump] NodeManager.GcsCheckAlive - 1152 total (1 active), Execution time: mean = 296.427 us, total = 341.484 ms, Queueing time: mean = 589.300 us, max = 2.307 ms, min = 5.323 us, total = 678.874 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 576 total (1 active), Execution time: mean = 1.706 ms, total = 982.929 ms, Queueing time: mean = 63.508 us, max = 1.632 ms, min = 11.175 us, total = 36.581 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 96 total (1 active, 1 running), Execution time: mean = 2.592 ms, total = 248.842 ms, Queueing time: mean = 59.775 us, max = 172.215 us, min = 13.784 us, total = 5.738 ms [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 11 total (1 active), Execution time: mean = 490.796 s, total = 5398.758 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 10 total (0 active), Execution time: mean = 341.406 us, total = 3.414 ms, Queueing time: mean = 66.172 us, max = 184.802 us, min = 20.320 us, total = 661.725 us [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 6.280 us, total = 43.960 us, Queueing time: mean = 44.707 us, max = 79.050 us, min = 26.627 us, total = 312.950 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:49:16,225 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:49:16,470 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 508015 total (35 active) [state-dump] Queueing time: mean = 7.066 ms, max = 590.169 s, min = -0.000 s, total = 3589.656 s [state-dump] Execution time: mean = 10.784 ms, total = 5478.547 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 122178 total (0 active), Execution time: mean = 453.730 us, total = 55.436 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 122178 total (0 active), Execution time: mean = 31.450 us, total = 3.843 s, Queueing time: mean = 92.816 us, max = 23.460 ms, min = 1.508 us, total = 11.340 s [state-dump] RaySyncer.OnDemandBroadcasting - 58151 total (1 active), Execution time: mean = 9.428 us, total = 548.231 ms, Queueing time: mean = 80.829 us, max = 65.085 ms, min = -0.000 s, total = 4.700 s [state-dump] NodeManager.CheckGC - 58151 total (1 active), Execution time: mean = 4.062 us, total = 236.227 ms, Queueing time: mean = 85.401 us, max = 60.039 ms, min = 3.126 us, total = 4.966 s [state-dump] ObjectManager.UpdateAvailableMemory - 58150 total (0 active), Execution time: mean = 5.033 us, total = 292.658 ms, Queueing time: mean = 87.509 us, max = 48.698 ms, min = 2.040 us, total = 5.089 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 29090 total (1 active), Execution time: mean = 16.244 us, total = 472.545 ms, Queueing time: mean = 65.989 us, max = 41.182 ms, min = -0.000 s, total = 1.920 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 23236 total (1 active), Execution time: mean = 431.953 us, total = 10.037 s, Queueing time: mean = 64.578 us, max = 13.366 ms, min = 93.000 ns, total = 1.501 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5820 total (1 active), Execution time: mean = 8.336 us, total = 48.517 ms, Queueing time: mean = 170.864 us, max = 3.537 ms, min = -0.000 s, total = 994.426 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 5820 total (1 active), Execution time: mean = 14.754 us, total = 85.868 ms, Queueing time: mean = 61.591 us, max = 3.804 ms, min = 7.553 us, total = 358.461 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5820 total (1 active), Execution time: mean = 3.172 us, total = 18.462 ms, Queueing time: mean = 174.269 us, max = 3.551 ms, min = 2.496 us, total = 1.014 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5818 total (0 active), Execution time: mean = 98.040 us, total = 570.396 ms, Queueing time: mean = 91.932 us, max = 2.573 ms, min = 6.667 us, total = 534.863 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5818 total (0 active), Execution time: mean = 549.991 us, total = 3.200 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1941 total (1 active), Execution time: mean = 8.017 us, total = 15.561 ms, Queueing time: mean = 67.740 us, max = 5.703 ms, min = 11.179 us, total = 131.484 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1164 total (0 active), Execution time: mean = 1.377 ms, total = 1.603 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1164 total (0 active), Execution time: mean = 49.890 us, total = 58.072 ms, Queueing time: mean = 93.429 us, max = 3.960 ms, min = 6.906 us, total = 108.751 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1164 total (1 active), Execution time: mean = 521.951 us, total = 607.551 ms, Queueing time: mean = 364.063 us, max = 2.197 ms, min = 6.917 us, total = 423.769 ms [state-dump] NodeManager.GcsCheckAlive - 1164 total (1 active), Execution time: mean = 296.292 us, total = 344.884 ms, Queueing time: mean = 589.191 us, max = 2.307 ms, min = 5.323 us, total = 685.818 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 582 total (1 active), Execution time: mean = 1.707 ms, total = 993.665 ms, Queueing time: mean = 63.579 us, max = 1.632 ms, min = 11.175 us, total = 37.003 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 97 total (1 active, 1 running), Execution time: mean = 2.595 ms, total = 251.689 ms, Queueing time: mean = 59.512 us, max = 172.215 us, min = 13.784 us, total = 5.773 ms [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 11 total (1 active), Execution time: mean = 490.796 s, total = 5398.758 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 10 total (0 active), Execution time: mean = 341.406 us, total = 3.414 ms, Queueing time: mean = 66.172 us, max = 184.802 us, min = 20.320 us, total = 661.725 us [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 6.280 us, total = 43.960 us, Queueing time: mean = 44.707 us, max = 79.050 us, min = 26.627 us, total = 312.950 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:50:16,225 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:50:16,472 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 513249 total (35 active) [state-dump] Queueing time: mean = 6.995 ms, max = 590.169 s, min = -0.000 s, total = 3589.956 s [state-dump] Execution time: mean = 10.676 ms, total = 5479.261 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 123438 total (0 active), Execution time: mean = 453.094 us, total = 55.929 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 123438 total (0 active), Execution time: mean = 31.394 us, total = 3.875 s, Queueing time: mean = 92.582 us, max = 23.460 ms, min = 1.508 us, total = 11.428 s [state-dump] RaySyncer.OnDemandBroadcasting - 58751 total (1 active), Execution time: mean = 9.421 us, total = 553.487 ms, Queueing time: mean = 80.801 us, max = 65.085 ms, min = -0.000 s, total = 4.747 s [state-dump] NodeManager.CheckGC - 58751 total (1 active), Execution time: mean = 4.051 us, total = 238.013 ms, Queueing time: mean = 85.376 us, max = 60.039 ms, min = 3.126 us, total = 5.016 s [state-dump] ObjectManager.UpdateAvailableMemory - 58750 total (0 active), Execution time: mean = 5.027 us, total = 295.311 ms, Queueing time: mean = 87.279 us, max = 48.698 ms, min = 2.040 us, total = 5.128 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 29390 total (1 active), Execution time: mean = 16.226 us, total = 476.893 ms, Queueing time: mean = 65.873 us, max = 41.182 ms, min = -0.000 s, total = 1.936 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 23475 total (1 active), Execution time: mean = 431.917 us, total = 10.139 s, Queueing time: mean = 64.566 us, max = 13.366 ms, min = 93.000 ns, total = 1.516 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5880 total (1 active), Execution time: mean = 8.334 us, total = 49.004 ms, Queueing time: mean = 170.997 us, max = 3.537 ms, min = -0.000 s, total = 1.005 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 5880 total (1 active), Execution time: mean = 14.755 us, total = 86.758 ms, Queueing time: mean = 61.488 us, max = 3.804 ms, min = 7.553 us, total = 361.552 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5880 total (1 active), Execution time: mean = 3.173 us, total = 18.657 ms, Queueing time: mean = 174.399 us, max = 3.551 ms, min = 2.496 us, total = 1.025 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5878 total (0 active), Execution time: mean = 97.968 us, total = 575.855 ms, Queueing time: mean = 91.724 us, max = 2.573 ms, min = 6.667 us, total = 539.152 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5878 total (0 active), Execution time: mean = 549.006 us, total = 3.227 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1961 total (1 active), Execution time: mean = 8.011 us, total = 15.710 ms, Queueing time: mean = 67.568 us, max = 5.703 ms, min = 11.179 us, total = 132.501 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1176 total (0 active), Execution time: mean = 1.376 ms, total = 1.618 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1176 total (0 active), Execution time: mean = 49.849 us, total = 58.623 ms, Queueing time: mean = 93.406 us, max = 3.960 ms, min = 6.906 us, total = 109.845 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1176 total (1 active), Execution time: mean = 522.018 us, total = 613.893 ms, Queueing time: mean = 364.597 us, max = 2.197 ms, min = 6.917 us, total = 428.766 ms [state-dump] NodeManager.GcsCheckAlive - 1176 total (1 active), Execution time: mean = 296.124 us, total = 348.241 ms, Queueing time: mean = 590.075 us, max = 2.307 ms, min = 5.323 us, total = 693.928 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 588 total (1 active), Execution time: mean = 1.708 ms, total = 1.004 s, Queueing time: mean = 63.460 us, max = 1.632 ms, min = 11.175 us, total = 37.314 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 98 total (1 active, 1 running), Execution time: mean = 2.586 ms, total = 253.426 ms, Queueing time: mean = 59.646 us, max = 172.215 us, min = 13.784 us, total = 5.845 ms [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 11 total (1 active), Execution time: mean = 490.796 s, total = 5398.758 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 10 total (0 active), Execution time: mean = 341.406 us, total = 3.414 ms, Queueing time: mean = 66.172 us, max = 184.802 us, min = 20.320 us, total = 661.725 us [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 6.280 us, total = 43.960 us, Queueing time: mean = 44.707 us, max = 79.050 us, min = 26.627 us, total = 312.950 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:51:16,225 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:51:16,475 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 518481 total (35 active) [state-dump] Queueing time: mean = 6.925 ms, max = 590.169 s, min = -0.000 s, total = 3590.387 s [state-dump] Execution time: mean = 10.570 ms, total = 5480.191 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 124698 total (0 active), Execution time: mean = 453.880 us, total = 56.598 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 124698 total (0 active), Execution time: mean = 31.443 us, total = 3.921 s, Queueing time: mean = 92.846 us, max = 23.460 ms, min = 1.508 us, total = 11.578 s [state-dump] RaySyncer.OnDemandBroadcasting - 59350 total (1 active), Execution time: mean = 9.429 us, total = 559.634 ms, Queueing time: mean = 80.957 us, max = 65.085 ms, min = -0.000 s, total = 4.805 s [state-dump] NodeManager.CheckGC - 59350 total (1 active), Execution time: mean = 4.043 us, total = 239.947 ms, Queueing time: mean = 85.548 us, max = 60.039 ms, min = 3.126 us, total = 5.077 s [state-dump] ObjectManager.UpdateAvailableMemory - 59349 total (0 active), Execution time: mean = 5.037 us, total = 298.924 ms, Queueing time: mean = 87.521 us, max = 48.698 ms, min = 2.040 us, total = 5.194 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 29690 total (1 active), Execution time: mean = 16.248 us, total = 482.391 ms, Queueing time: mean = 65.962 us, max = 41.182 ms, min = -0.000 s, total = 1.958 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 23715 total (1 active), Execution time: mean = 432.112 us, total = 10.248 s, Queueing time: mean = 64.758 us, max = 13.366 ms, min = 93.000 ns, total = 1.536 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5940 total (1 active), Execution time: mean = 8.345 us, total = 49.572 ms, Queueing time: mean = 171.365 us, max = 3.537 ms, min = -0.000 s, total = 1.018 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 5940 total (1 active), Execution time: mean = 14.775 us, total = 87.762 ms, Queueing time: mean = 61.579 us, max = 3.804 ms, min = 7.553 us, total = 365.778 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5940 total (1 active), Execution time: mean = 3.175 us, total = 18.858 ms, Queueing time: mean = 174.769 us, max = 3.551 ms, min = 2.496 us, total = 1.038 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5938 total (0 active), Execution time: mean = 98.035 us, total = 582.133 ms, Queueing time: mean = 92.013 us, max = 2.573 ms, min = 6.667 us, total = 546.375 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5938 total (0 active), Execution time: mean = 550.014 us, total = 3.266 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1981 total (1 active), Execution time: mean = 8.013 us, total = 15.873 ms, Queueing time: mean = 67.738 us, max = 5.703 ms, min = 11.179 us, total = 134.189 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1188 total (0 active), Execution time: mean = 1.377 ms, total = 1.636 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1188 total (0 active), Execution time: mean = 49.956 us, total = 59.348 ms, Queueing time: mean = 93.521 us, max = 3.960 ms, min = 6.906 us, total = 111.103 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1188 total (1 active), Execution time: mean = 522.553 us, total = 620.793 ms, Queueing time: mean = 365.328 us, max = 2.197 ms, min = 6.917 us, total = 434.009 ms [state-dump] NodeManager.GcsCheckAlive - 1188 total (1 active), Execution time: mean = 296.444 us, total = 352.176 ms, Queueing time: mean = 591.023 us, max = 2.307 ms, min = 5.323 us, total = 702.136 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 594 total (1 active), Execution time: mean = 1.711 ms, total = 1.016 s, Queueing time: mean = 63.513 us, max = 1.632 ms, min = 11.175 us, total = 37.727 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 99 total (1 active, 1 running), Execution time: mean = 2.572 ms, total = 254.611 ms, Queueing time: mean = 60.394 us, max = 172.215 us, min = 13.784 us, total = 5.979 ms [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 11 total (1 active), Execution time: mean = 490.796 s, total = 5398.758 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 10 total (0 active), Execution time: mean = 341.406 us, total = 3.414 ms, Queueing time: mean = 66.172 us, max = 184.802 us, min = 20.320 us, total = 661.725 us [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 6.280 us, total = 43.960 us, Queueing time: mean = 44.707 us, max = 79.050 us, min = 26.627 us, total = 312.950 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:52:16,225 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:52:16,478 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 523717 total (35 active) [state-dump] Queueing time: mean = 6.856 ms, max = 590.169 s, min = -0.000 s, total = 3590.789 s [state-dump] Execution time: mean = 11.611 ms, total = 6081.057 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 125958 total (0 active), Execution time: mean = 454.205 us, total = 57.211 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 125958 total (0 active), Execution time: mean = 31.460 us, total = 3.963 s, Queueing time: mean = 93.026 us, max = 23.460 ms, min = 1.508 us, total = 11.717 s [state-dump] RaySyncer.OnDemandBroadcasting - 59950 total (1 active), Execution time: mean = 9.438 us, total = 565.828 ms, Queueing time: mean = 81.091 us, max = 65.085 ms, min = -0.000 s, total = 4.861 s [state-dump] NodeManager.CheckGC - 59950 total (1 active), Execution time: mean = 4.034 us, total = 241.828 ms, Queueing time: mean = 85.699 us, max = 60.039 ms, min = 3.126 us, total = 5.138 s [state-dump] ObjectManager.UpdateAvailableMemory - 59949 total (0 active), Execution time: mean = 5.042 us, total = 302.288 ms, Queueing time: mean = 87.641 us, max = 48.698 ms, min = 2.040 us, total = 5.254 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 29990 total (1 active), Execution time: mean = 16.265 us, total = 487.774 ms, Queueing time: mean = 66.123 us, max = 41.182 ms, min = -0.000 s, total = 1.983 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 23954 total (1 active), Execution time: mean = 432.272 us, total = 10.355 s, Queueing time: mean = 64.801 us, max = 13.366 ms, min = 93.000 ns, total = 1.552 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6000 total (1 active), Execution time: mean = 8.352 us, total = 50.112 ms, Queueing time: mean = 171.245 us, max = 3.537 ms, min = -0.000 s, total = 1.027 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6000 total (1 active), Execution time: mean = 14.796 us, total = 88.774 ms, Queueing time: mean = 61.660 us, max = 3.804 ms, min = 7.553 us, total = 369.962 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6000 total (1 active), Execution time: mean = 3.176 us, total = 19.054 ms, Queueing time: mean = 174.651 us, max = 3.551 ms, min = 2.496 us, total = 1.048 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5998 total (0 active), Execution time: mean = 98.088 us, total = 588.329 ms, Queueing time: mean = 92.174 us, max = 2.573 ms, min = 6.667 us, total = 552.858 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5998 total (0 active), Execution time: mean = 550.621 us, total = 3.303 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2001 total (1 active), Execution time: mean = 8.017 us, total = 16.042 ms, Queueing time: mean = 67.745 us, max = 5.703 ms, min = 11.179 us, total = 135.558 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1200 total (0 active), Execution time: mean = 1.378 ms, total = 1.654 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1200 total (0 active), Execution time: mean = 49.974 us, total = 59.969 ms, Queueing time: mean = 93.573 us, max = 3.960 ms, min = 6.906 us, total = 112.287 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1200 total (1 active), Execution time: mean = 522.690 us, total = 627.229 ms, Queueing time: mean = 365.210 us, max = 2.197 ms, min = 6.917 us, total = 438.252 ms [state-dump] NodeManager.GcsCheckAlive - 1200 total (1 active), Execution time: mean = 296.551 us, total = 355.861 ms, Queueing time: mean = 590.946 us, max = 2.307 ms, min = 5.323 us, total = 709.135 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 600 total (1 active), Execution time: mean = 1.711 ms, total = 1.026 s, Queueing time: mean = 63.456 us, max = 1.632 ms, min = 11.175 us, total = 38.074 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 100 total (1 active, 1 running), Execution time: mean = 2.576 ms, total = 257.613 ms, Queueing time: mean = 60.497 us, max = 172.215 us, min = 13.784 us, total = 6.050 ms [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 12 total (1 active), Execution time: mean = 499.897 s, total = 5998.759 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 11 total (0 active), Execution time: mean = 333.411 us, total = 3.668 ms, Queueing time: mean = 97.445 us, max = 410.175 us, min = 20.320 us, total = 1.072 ms [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 6.280 us, total = 43.960 us, Queueing time: mean = 44.707 us, max = 79.050 us, min = 26.627 us, total = 312.950 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:53:16,226 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:53:16,481 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 528949 total (35 active) [state-dump] Queueing time: mean = 6.789 ms, max = 590.169 s, min = -0.000 s, total = 3591.107 s [state-dump] Execution time: mean = 11.498 ms, total = 6081.789 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 127218 total (0 active), Execution time: mean = 453.672 us, total = 57.715 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 127218 total (0 active), Execution time: mean = 31.406 us, total = 3.995 s, Queueing time: mean = 92.907 us, max = 23.460 ms, min = 1.508 us, total = 11.819 s [state-dump] RaySyncer.OnDemandBroadcasting - 60549 total (1 active), Execution time: mean = 9.435 us, total = 571.301 ms, Queueing time: mean = 81.007 us, max = 65.085 ms, min = -0.000 s, total = 4.905 s [state-dump] NodeManager.CheckGC - 60549 total (1 active), Execution time: mean = 4.023 us, total = 243.602 ms, Queueing time: mean = 85.622 us, max = 60.039 ms, min = 3.126 us, total = 5.184 s [state-dump] ObjectManager.UpdateAvailableMemory - 60548 total (0 active), Execution time: mean = 5.036 us, total = 304.940 ms, Queueing time: mean = 87.438 us, max = 48.698 ms, min = 2.040 us, total = 5.294 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 30290 total (1 active), Execution time: mean = 16.257 us, total = 492.425 ms, Queueing time: mean = 66.182 us, max = 41.182 ms, min = -0.000 s, total = 2.005 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 24194 total (1 active), Execution time: mean = 432.219 us, total = 10.457 s, Queueing time: mean = 64.754 us, max = 13.366 ms, min = 93.000 ns, total = 1.567 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6060 total (1 active), Execution time: mean = 8.350 us, total = 50.603 ms, Queueing time: mean = 171.545 us, max = 3.537 ms, min = -0.000 s, total = 1.040 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6060 total (1 active), Execution time: mean = 14.785 us, total = 89.599 ms, Queueing time: mean = 61.596 us, max = 3.804 ms, min = 7.553 us, total = 373.273 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6060 total (1 active), Execution time: mean = 3.175 us, total = 19.239 ms, Queueing time: mean = 174.950 us, max = 3.551 ms, min = 2.496 us, total = 1.060 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6058 total (0 active), Execution time: mean = 98.058 us, total = 594.036 ms, Queueing time: mean = 92.003 us, max = 2.573 ms, min = 2.559 us, total = 557.352 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6058 total (0 active), Execution time: mean = 550.172 us, total = 3.333 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2021 total (1 active), Execution time: mean = 8.015 us, total = 16.198 ms, Queueing time: mean = 67.629 us, max = 5.703 ms, min = 11.179 us, total = 136.678 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1212 total (0 active), Execution time: mean = 1.377 ms, total = 1.669 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1212 total (0 active), Execution time: mean = 49.965 us, total = 60.558 ms, Queueing time: mean = 93.289 us, max = 3.960 ms, min = 6.906 us, total = 113.066 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1212 total (1 active), Execution time: mean = 522.951 us, total = 633.817 ms, Queueing time: mean = 366.297 us, max = 2.197 ms, min = 6.917 us, total = 443.952 ms [state-dump] NodeManager.GcsCheckAlive - 1212 total (1 active), Execution time: mean = 296.603 us, total = 359.483 ms, Queueing time: mean = 592.252 us, max = 2.307 ms, min = 5.323 us, total = 717.809 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 606 total (1 active), Execution time: mean = 1.713 ms, total = 1.038 s, Queueing time: mean = 63.356 us, max = 1.632 ms, min = 11.175 us, total = 38.394 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 101 total (1 active, 1 running), Execution time: mean = 2.579 ms, total = 260.494 ms, Queueing time: mean = 60.906 us, max = 172.215 us, min = 13.784 us, total = 6.151 ms [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 12 total (1 active), Execution time: mean = 499.897 s, total = 5998.759 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 11 total (0 active), Execution time: mean = 333.411 us, total = 3.668 ms, Queueing time: mean = 97.445 us, max = 410.175 us, min = 20.320 us, total = 1.072 ms [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 6.280 us, total = 43.960 us, Queueing time: mean = 44.707 us, max = 79.050 us, min = 26.627 us, total = 312.950 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 06:54:16,226 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:54:16,483 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 534183 total (35 active) [state-dump] Queueing time: mean = 6.723 ms, max = 590.169 s, min = -0.000 s, total = 3591.452 s [state-dump] Execution time: mean = 11.387 ms, total = 6082.598 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 128478 total (0 active), Execution time: mean = 453.662 us, total = 58.286 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 128478 total (0 active), Execution time: mean = 31.381 us, total = 4.032 s, Queueing time: mean = 92.899 us, max = 23.460 ms, min = 1.508 us, total = 11.935 s [state-dump] RaySyncer.OnDemandBroadcasting - 61149 total (1 active), Execution time: mean = 9.441 us, total = 577.286 ms, Queueing time: mean = 80.993 us, max = 65.085 ms, min = -0.000 s, total = 4.953 s [state-dump] NodeManager.CheckGC - 61149 total (1 active), Execution time: mean = 4.015 us, total = 245.501 ms, Queueing time: mean = 85.620 us, max = 60.039 ms, min = 3.126 us, total = 5.236 s [state-dump] ObjectManager.UpdateAvailableMemory - 61148 total (0 active), Execution time: mean = 5.037 us, total = 308.018 ms, Queueing time: mean = 87.395 us, max = 48.698 ms, min = 2.040 us, total = 5.344 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 30590 total (1 active), Execution time: mean = 16.255 us, total = 497.241 ms, Queueing time: mean = 66.189 us, max = 41.182 ms, min = -0.000 s, total = 2.025 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 24433 total (1 active), Execution time: mean = 432.236 us, total = 10.561 s, Queueing time: mean = 64.728 us, max = 13.366 ms, min = -0.000 s, total = 1.581 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6120 total (1 active), Execution time: mean = 8.356 us, total = 51.136 ms, Queueing time: mean = 171.570 us, max = 3.537 ms, min = -0.000 s, total = 1.050 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6120 total (1 active), Execution time: mean = 14.787 us, total = 90.494 ms, Queueing time: mean = 61.615 us, max = 3.804 ms, min = 7.553 us, total = 377.082 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6120 total (1 active), Execution time: mean = 3.175 us, total = 19.431 ms, Queueing time: mean = 174.977 us, max = 3.551 ms, min = 2.496 us, total = 1.071 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6118 total (0 active), Execution time: mean = 98.067 us, total = 599.972 ms, Queueing time: mean = 92.038 us, max = 2.573 ms, min = 2.559 us, total = 563.089 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6118 total (0 active), Execution time: mean = 550.390 us, total = 3.367 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2041 total (1 active), Execution time: mean = 8.025 us, total = 16.378 ms, Queueing time: mean = 67.555 us, max = 5.703 ms, min = 11.179 us, total = 137.880 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1224 total (0 active), Execution time: mean = 1.377 ms, total = 1.685 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1224 total (0 active), Execution time: mean = 49.978 us, total = 61.173 ms, Queueing time: mean = 93.222 us, max = 3.960 ms, min = 6.906 us, total = 114.103 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1224 total (1 active), Execution time: mean = 523.123 us, total = 640.302 ms, Queueing time: mean = 366.310 us, max = 2.197 ms, min = 6.917 us, total = 448.364 ms [state-dump] NodeManager.GcsCheckAlive - 1224 total (1 active), Execution time: mean = 296.492 us, total = 362.906 ms, Queueing time: mean = 592.547 us, max = 2.307 ms, min = 5.323 us, total = 725.278 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 612 total (1 active), Execution time: mean = 1.714 ms, total = 1.049 s, Queueing time: mean = 63.712 us, max = 1.632 ms, min = 11.175 us, total = 38.991 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 102 total (1 active, 1 running), Execution time: mean = 2.588 ms, total = 263.938 ms, Queueing time: mean = 61.233 us, max = 172.215 us, min = 13.784 us, total = 6.246 ms [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 12 total (1 active), Execution time: mean = 499.897 s, total = 5998.759 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 11 total (0 active), Execution time: mean = 333.411 us, total = 3.668 ms, Queueing time: mean = 97.445 us, max = 410.175 us, min = 20.320 us, total = 1.072 ms [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 6.280 us, total = 43.960 us, Queueing time: mean = 44.707 us, max = 79.050 us, min = 26.627 us, total = 312.950 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:55:16,226 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:55:16,486 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 539415 total (35 active) [state-dump] Queueing time: mean = 6.659 ms, max = 590.169 s, min = -0.000 s, total = 3591.757 s [state-dump] Execution time: mean = 11.278 ms, total = 6083.326 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 129738 total (0 active), Execution time: mean = 453.144 us, total = 58.790 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 129738 total (0 active), Execution time: mean = 31.323 us, total = 4.064 s, Queueing time: mean = 92.773 us, max = 23.460 ms, min = 1.508 us, total = 12.036 s [state-dump] RaySyncer.OnDemandBroadcasting - 61748 total (1 active), Execution time: mean = 9.437 us, total = 582.689 ms, Queueing time: mean = 80.905 us, max = 65.085 ms, min = -0.000 s, total = 4.996 s [state-dump] NodeManager.CheckGC - 61748 total (1 active), Execution time: mean = 4.005 us, total = 247.310 ms, Queueing time: mean = 85.538 us, max = 60.039 ms, min = 3.126 us, total = 5.282 s [state-dump] ObjectManager.UpdateAvailableMemory - 61747 total (0 active), Execution time: mean = 5.033 us, total = 310.775 ms, Queueing time: mean = 87.200 us, max = 48.698 ms, min = 2.040 us, total = 5.384 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 30890 total (1 active), Execution time: mean = 16.255 us, total = 502.108 ms, Queueing time: mean = 66.123 us, max = 41.182 ms, min = -0.000 s, total = 2.043 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 24673 total (1 active), Execution time: mean = 432.239 us, total = 10.665 s, Queueing time: mean = 64.653 us, max = 13.366 ms, min = -0.000 s, total = 1.595 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6180 total (1 active), Execution time: mean = 8.351 us, total = 51.610 ms, Queueing time: mean = 171.612 us, max = 3.537 ms, min = -0.000 s, total = 1.061 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6180 total (1 active), Execution time: mean = 14.768 us, total = 91.269 ms, Queueing time: mean = 61.503 us, max = 3.804 ms, min = 7.553 us, total = 380.091 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6180 total (1 active), Execution time: mean = 3.174 us, total = 19.617 ms, Queueing time: mean = 175.015 us, max = 3.551 ms, min = 2.496 us, total = 1.082 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6178 total (0 active), Execution time: mean = 97.893 us, total = 604.786 ms, Queueing time: mean = 91.806 us, max = 2.573 ms, min = 2.559 us, total = 567.175 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6178 total (0 active), Execution time: mean = 549.471 us, total = 3.395 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2061 total (1 active), Execution time: mean = 8.019 us, total = 16.527 ms, Queueing time: mean = 67.399 us, max = 5.703 ms, min = 11.179 us, total = 138.910 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1236 total (0 active), Execution time: mean = 1.376 ms, total = 1.701 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1236 total (0 active), Execution time: mean = 49.975 us, total = 61.769 ms, Queueing time: mean = 93.188 us, max = 3.960 ms, min = 6.906 us, total = 115.180 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1236 total (1 active), Execution time: mean = 523.624 us, total = 647.199 ms, Queueing time: mean = 366.258 us, max = 2.197 ms, min = 6.917 us, total = 452.695 ms [state-dump] NodeManager.GcsCheckAlive - 1236 total (1 active), Execution time: mean = 296.675 us, total = 366.690 ms, Queueing time: mean = 592.764 us, max = 2.307 ms, min = 5.323 us, total = 732.657 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 618 total (1 active), Execution time: mean = 1.714 ms, total = 1.059 s, Queueing time: mean = 64.346 us, max = 1.632 ms, min = 11.175 us, total = 39.766 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 103 total (1 active, 1 running), Execution time: mean = 2.580 ms, total = 265.791 ms, Queueing time: mean = 62.115 us, max = 172.215 us, min = 13.784 us, total = 6.398 ms [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 12 total (1 active), Execution time: mean = 499.897 s, total = 5998.759 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 11 total (0 active), Execution time: mean = 333.411 us, total = 3.668 ms, Queueing time: mean = 97.445 us, max = 410.175 us, min = 20.320 us, total = 1.072 ms [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 6.280 us, total = 43.960 us, Queueing time: mean = 44.707 us, max = 79.050 us, min = 26.627 us, total = 312.950 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:56:16,227 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:56:16,489 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 544649 total (35 active) [state-dump] Queueing time: mean = 6.595 ms, max = 590.169 s, min = -0.000 s, total = 3591.958 s [state-dump] Execution time: mean = 11.170 ms, total = 6083.915 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 130998 total (0 active), Execution time: mean = 451.729 us, total = 59.176 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 130998 total (0 active), Execution time: mean = 31.217 us, total = 4.089 s, Queueing time: mean = 92.313 us, max = 23.460 ms, min = 1.508 us, total = 12.093 s [state-dump] RaySyncer.OnDemandBroadcasting - 62348 total (1 active), Execution time: mean = 9.417 us, total = 587.111 ms, Queueing time: mean = 80.645 us, max = 65.085 ms, min = -0.000 s, total = 5.028 s [state-dump] NodeManager.CheckGC - 62348 total (1 active), Execution time: mean = 3.994 us, total = 248.992 ms, Queueing time: mean = 85.271 us, max = 60.039 ms, min = 3.126 us, total = 5.316 s [state-dump] ObjectManager.UpdateAvailableMemory - 62347 total (0 active), Execution time: mean = 5.017 us, total = 312.774 ms, Queueing time: mean = 86.660 us, max = 48.698 ms, min = 2.040 us, total = 5.403 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 31190 total (1 active), Execution time: mean = 16.216 us, total = 505.772 ms, Queueing time: mean = 65.854 us, max = 41.182 ms, min = -0.000 s, total = 2.054 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 24912 total (1 active), Execution time: mean = 432.084 us, total = 10.764 s, Queueing time: mean = 64.395 us, max = 13.366 ms, min = -0.000 s, total = 1.604 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6240 total (1 active), Execution time: mean = 8.329 us, total = 51.971 ms, Queueing time: mean = 171.575 us, max = 3.537 ms, min = -0.000 s, total = 1.071 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6240 total (1 active), Execution time: mean = 14.746 us, total = 92.018 ms, Queueing time: mean = 61.282 us, max = 3.804 ms, min = 7.553 us, total = 382.400 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6240 total (1 active), Execution time: mean = 3.170 us, total = 19.781 ms, Queueing time: mean = 174.969 us, max = 3.551 ms, min = 2.496 us, total = 1.092 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6238 total (0 active), Execution time: mean = 97.850 us, total = 610.390 ms, Queueing time: mean = 91.354 us, max = 2.573 ms, min = 2.559 us, total = 569.865 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6238 total (0 active), Execution time: mean = 548.070 us, total = 3.419 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2081 total (1 active), Execution time: mean = 8.010 us, total = 16.669 ms, Queueing time: mean = 67.115 us, max = 5.703 ms, min = 11.179 us, total = 139.665 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1248 total (0 active), Execution time: mean = 1.373 ms, total = 1.713 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1248 total (0 active), Execution time: mean = 49.879 us, total = 62.249 ms, Queueing time: mean = 92.596 us, max = 3.960 ms, min = 6.906 us, total = 115.559 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1248 total (1 active), Execution time: mean = 523.272 us, total = 653.043 ms, Queueing time: mean = 366.303 us, max = 2.197 ms, min = 6.917 us, total = 457.146 ms [state-dump] NodeManager.GcsCheckAlive - 1248 total (1 active), Execution time: mean = 296.407 us, total = 369.916 ms, Queueing time: mean = 592.724 us, max = 2.307 ms, min = 5.323 us, total = 739.720 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 624 total (1 active), Execution time: mean = 1.714 ms, total = 1.069 s, Queueing time: mean = 64.087 us, max = 1.632 ms, min = 11.175 us, total = 39.990 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 104 total (1 active, 1 running), Execution time: mean = 2.583 ms, total = 268.658 ms, Queueing time: mean = 61.795 us, max = 172.215 us, min = 13.784 us, total = 6.427 ms [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 12 total (1 active), Execution time: mean = 499.897 s, total = 5998.759 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 11 total (0 active), Execution time: mean = 333.411 us, total = 3.668 ms, Queueing time: mean = 97.445 us, max = 410.175 us, min = 20.320 us, total = 1.072 ms [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 6.280 us, total = 43.960 us, Queueing time: mean = 44.707 us, max = 79.050 us, min = 26.627 us, total = 312.950 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:57:16,227 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:57:16,492 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 549884 total (35 active) [state-dump] Queueing time: mean = 6.533 ms, max = 590.169 s, min = -0.000 s, total = 3592.205 s [state-dump] Execution time: mean = 11.065 ms, total = 6084.559 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 132258 total (0 active), Execution time: mean = 450.662 us, total = 59.604 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 132258 total (0 active), Execution time: mean = 31.134 us, total = 4.118 s, Queueing time: mean = 92.003 us, max = 23.460 ms, min = 1.508 us, total = 12.168 s [state-dump] RaySyncer.OnDemandBroadcasting - 62948 total (1 active), Execution time: mean = 9.402 us, total = 591.869 ms, Queueing time: mean = 80.443 us, max = 65.085 ms, min = -0.000 s, total = 5.064 s [state-dump] NodeManager.CheckGC - 62948 total (1 active), Execution time: mean = 3.983 us, total = 250.700 ms, Queueing time: mean = 85.066 us, max = 60.039 ms, min = 3.126 us, total = 5.355 s [state-dump] ObjectManager.UpdateAvailableMemory - 62947 total (0 active), Execution time: mean = 5.006 us, total = 315.131 ms, Queueing time: mean = 86.322 us, max = 48.698 ms, min = 2.040 us, total = 5.434 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 31489 total (1 active), Execution time: mean = 16.187 us, total = 509.711 ms, Queueing time: mean = 65.684 us, max = 41.182 ms, min = -0.000 s, total = 2.068 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 25152 total (1 active), Execution time: mean = 432.071 us, total = 10.867 s, Queueing time: mean = 64.217 us, max = 13.366 ms, min = -0.000 s, total = 1.615 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6300 total (1 active), Execution time: mean = 8.313 us, total = 52.372 ms, Queueing time: mean = 171.689 us, max = 3.537 ms, min = -0.000 s, total = 1.082 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6300 total (1 active), Execution time: mean = 14.730 us, total = 92.796 ms, Queueing time: mean = 61.101 us, max = 3.804 ms, min = 7.553 us, total = 384.934 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6300 total (1 active), Execution time: mean = 3.167 us, total = 19.953 ms, Queueing time: mean = 175.073 us, max = 3.551 ms, min = 2.496 us, total = 1.103 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6298 total (0 active), Execution time: mean = 97.823 us, total = 616.092 ms, Queueing time: mean = 90.996 us, max = 2.573 ms, min = 2.559 us, total = 573.092 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6298 total (0 active), Execution time: mean = 547.174 us, total = 3.446 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2101 total (1 active), Execution time: mean = 7.989 us, total = 16.784 ms, Queueing time: mean = 66.837 us, max = 5.703 ms, min = 11.179 us, total = 140.425 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1260 total (0 active), Execution time: mean = 1.370 ms, total = 1.726 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1260 total (0 active), Execution time: mean = 49.794 us, total = 62.740 ms, Queueing time: mean = 92.090 us, max = 3.960 ms, min = 6.906 us, total = 116.034 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1260 total (1 active), Execution time: mean = 523.094 us, total = 659.099 ms, Queueing time: mean = 366.826 us, max = 2.197 ms, min = 6.917 us, total = 462.201 ms [state-dump] NodeManager.GcsCheckAlive - 1260 total (1 active), Execution time: mean = 296.203 us, total = 373.216 ms, Queueing time: mean = 593.325 us, max = 2.307 ms, min = 5.323 us, total = 747.590 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 630 total (1 active), Execution time: mean = 1.714 ms, total = 1.080 s, Queueing time: mean = 63.924 us, max = 1.632 ms, min = 11.175 us, total = 40.272 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 105 total (1 active, 1 running), Execution time: mean = 2.587 ms, total = 271.616 ms, Queueing time: mean = 61.482 us, max = 172.215 us, min = 13.784 us, total = 6.456 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 12 total (1 active), Execution time: mean = 499.897 s, total = 5998.759 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 11 total (0 active), Execution time: mean = 333.411 us, total = 3.668 ms, Queueing time: mean = 97.445 us, max = 410.175 us, min = 20.320 us, total = 1.072 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 6.515 us, total = 52.120 us, Queueing time: mean = 48.513 us, max = 79.050 us, min = 26.627 us, total = 388.107 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:58:16,227 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:58:16,495 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 555115 total (35 active) [state-dump] Queueing time: mean = 6.472 ms, max = 590.169 s, min = -0.000 s, total = 3592.628 s [state-dump] Execution time: mean = 10.962 ms, total = 6085.444 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 133518 total (0 active), Execution time: mean = 451.087 us, total = 60.228 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 133518 total (0 active), Execution time: mean = 31.151 us, total = 4.159 s, Queueing time: mean = 92.219 us, max = 23.460 ms, min = 1.508 us, total = 12.313 s [state-dump] RaySyncer.OnDemandBroadcasting - 63547 total (1 active), Execution time: mean = 9.431 us, total = 599.315 ms, Queueing time: mean = 80.589 us, max = 65.085 ms, min = -0.000 s, total = 5.121 s [state-dump] NodeManager.CheckGC - 63547 total (1 active), Execution time: mean = 3.977 us, total = 252.741 ms, Queueing time: mean = 85.243 us, max = 60.039 ms, min = 3.126 us, total = 5.417 s [state-dump] ObjectManager.UpdateAvailableMemory - 63546 total (0 active), Execution time: mean = 5.021 us, total = 319.053 ms, Queueing time: mean = 86.522 us, max = 48.698 ms, min = 2.040 us, total = 5.498 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 31789 total (1 active), Execution time: mean = 16.222 us, total = 515.691 ms, Queueing time: mean = 65.829 us, max = 41.182 ms, min = -0.000 s, total = 2.093 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 25391 total (1 active), Execution time: mean = 432.321 us, total = 10.977 s, Queueing time: mean = 64.420 us, max = 13.366 ms, min = -0.000 s, total = 1.636 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6360 total (1 active), Execution time: mean = 8.320 us, total = 52.917 ms, Queueing time: mean = 171.746 us, max = 3.537 ms, min = -0.000 s, total = 1.092 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6360 total (1 active), Execution time: mean = 14.784 us, total = 94.029 ms, Queueing time: mean = 61.339 us, max = 3.804 ms, min = 7.553 us, total = 390.117 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6360 total (1 active), Execution time: mean = 3.167 us, total = 20.144 ms, Queueing time: mean = 175.135 us, max = 3.551 ms, min = 2.496 us, total = 1.114 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6358 total (0 active), Execution time: mean = 97.949 us, total = 622.762 ms, Queueing time: mean = 91.207 us, max = 2.573 ms, min = 2.559 us, total = 579.891 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6358 total (0 active), Execution time: mean = 547.972 us, total = 3.484 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2121 total (1 active), Execution time: mean = 8.026 us, total = 17.024 ms, Queueing time: mean = 66.991 us, max = 5.703 ms, min = 11.179 us, total = 142.087 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1272 total (0 active), Execution time: mean = 1.372 ms, total = 1.745 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1272 total (0 active), Execution time: mean = 49.862 us, total = 63.424 ms, Queueing time: mean = 92.300 us, max = 3.960 ms, min = 6.906 us, total = 117.406 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1272 total (1 active), Execution time: mean = 523.207 us, total = 665.520 ms, Queueing time: mean = 366.905 us, max = 2.197 ms, min = 6.917 us, total = 466.703 ms [state-dump] NodeManager.GcsCheckAlive - 1272 total (1 active), Execution time: mean = 296.614 us, total = 377.293 ms, Queueing time: mean = 593.169 us, max = 2.307 ms, min = 5.323 us, total = 754.510 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 636 total (1 active), Execution time: mean = 1.715 ms, total = 1.091 s, Queueing time: mean = 63.969 us, max = 1.632 ms, min = 11.175 us, total = 40.684 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 106 total (1 active, 1 running), Execution time: mean = 2.591 ms, total = 274.652 ms, Queueing time: mean = 61.481 us, max = 172.215 us, min = 13.784 us, total = 6.517 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 12 total (1 active), Execution time: mean = 499.897 s, total = 5998.759 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 11 total (0 active), Execution time: mean = 333.411 us, total = 3.668 ms, Queueing time: mean = 97.445 us, max = 410.175 us, min = 20.320 us, total = 1.072 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 6.515 us, total = 52.120 us, Queueing time: mean = 48.513 us, max = 79.050 us, min = 26.627 us, total = 388.107 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:59:16,227 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:59:16,497 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 560347 total (35 active) [state-dump] Queueing time: mean = 6.412 ms, max = 590.169 s, min = -0.000 s, total = 3593.059 s [state-dump] Execution time: mean = 10.862 ms, total = 6086.342 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 134778 total (0 active), Execution time: mean = 451.569 us, total = 60.862 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 134778 total (0 active), Execution time: mean = 31.166 us, total = 4.200 s, Queueing time: mean = 92.389 us, max = 23.460 ms, min = 1.508 us, total = 12.452 s [state-dump] RaySyncer.OnDemandBroadcasting - 64146 total (1 active), Execution time: mean = 9.467 us, total = 607.295 ms, Queueing time: mean = 80.791 us, max = 65.085 ms, min = -0.000 s, total = 5.182 s [state-dump] NodeManager.CheckGC - 64146 total (1 active), Execution time: mean = 3.972 us, total = 254.815 ms, Queueing time: mean = 85.483 us, max = 60.039 ms, min = 3.126 us, total = 5.483 s [state-dump] ObjectManager.UpdateAvailableMemory - 64145 total (0 active), Execution time: mean = 5.037 us, total = 323.091 ms, Queueing time: mean = 86.676 us, max = 48.698 ms, min = 2.040 us, total = 5.560 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 32089 total (1 active), Execution time: mean = 16.281 us, total = 522.450 ms, Queueing time: mean = 66.093 us, max = 41.182 ms, min = -0.000 s, total = 2.121 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 25631 total (1 active), Execution time: mean = 432.551 us, total = 11.087 s, Queueing time: mean = 64.708 us, max = 13.366 ms, min = -0.000 s, total = 1.659 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6420 total (1 active), Execution time: mean = 8.334 us, total = 53.501 ms, Queueing time: mean = 171.985 us, max = 3.537 ms, min = -0.000 s, total = 1.104 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6420 total (1 active), Execution time: mean = 14.857 us, total = 95.382 ms, Queueing time: mean = 61.464 us, max = 3.804 ms, min = 7.553 us, total = 394.597 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6420 total (1 active), Execution time: mean = 3.171 us, total = 20.356 ms, Queueing time: mean = 175.381 us, max = 3.551 ms, min = 2.496 us, total = 1.126 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6418 total (0 active), Execution time: mean = 98.081 us, total = 629.484 ms, Queueing time: mean = 91.312 us, max = 2.573 ms, min = 2.559 us, total = 586.039 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6418 total (0 active), Execution time: mean = 548.744 us, total = 3.522 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2141 total (1 active), Execution time: mean = 8.059 us, total = 17.254 ms, Queueing time: mean = 67.178 us, max = 5.703 ms, min = 11.179 us, total = 143.828 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1284 total (0 active), Execution time: mean = 1.374 ms, total = 1.765 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1284 total (0 active), Execution time: mean = 49.935 us, total = 64.117 ms, Queueing time: mean = 92.382 us, max = 3.960 ms, min = 6.906 us, total = 118.619 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1284 total (1 active), Execution time: mean = 523.309 us, total = 671.929 ms, Queueing time: mean = 368.117 us, max = 2.197 ms, min = 6.917 us, total = 472.662 ms [state-dump] NodeManager.GcsCheckAlive - 1284 total (1 active), Execution time: mean = 297.200 us, total = 381.605 ms, Queueing time: mean = 593.931 us, max = 2.307 ms, min = 5.323 us, total = 762.607 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 642 total (1 active), Execution time: mean = 1.717 ms, total = 1.102 s, Queueing time: mean = 64.124 us, max = 1.632 ms, min = 11.175 us, total = 41.168 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 107 total (1 active, 1 running), Execution time: mean = 2.593 ms, total = 277.444 ms, Queueing time: mean = 61.645 us, max = 172.215 us, min = 13.784 us, total = 6.596 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 12 total (1 active), Execution time: mean = 499.897 s, total = 5998.759 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 11 total (0 active), Execution time: mean = 333.411 us, total = 3.668 ms, Queueing time: mean = 97.445 us, max = 410.175 us, min = 20.320 us, total = 1.072 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 6.515 us, total = 52.120 us, Queueing time: mean = 48.513 us, max = 79.050 us, min = 26.627 us, total = 388.107 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:00:16,228 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:00:16,500 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 565582 total (35 active) [state-dump] Queueing time: mean = 6.354 ms, max = 590.169 s, min = -0.000 s, total = 3593.464 s [state-dump] Execution time: mean = 10.763 ms, total = 6087.265 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 136038 total (0 active), Execution time: mean = 452.289 us, total = 61.528 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 136038 total (0 active), Execution time: mean = 31.190 us, total = 4.243 s, Queueing time: mean = 92.577 us, max = 23.460 ms, min = 1.508 us, total = 12.594 s [state-dump] RaySyncer.OnDemandBroadcasting - 64746 total (1 active), Execution time: mean = 9.483 us, total = 613.958 ms, Queueing time: mean = 80.891 us, max = 65.085 ms, min = -0.000 s, total = 5.237 s [state-dump] NodeManager.CheckGC - 64746 total (1 active), Execution time: mean = 3.967 us, total = 256.827 ms, Queueing time: mean = 85.604 us, max = 60.039 ms, min = 3.126 us, total = 5.542 s [state-dump] ObjectManager.UpdateAvailableMemory - 64745 total (0 active), Execution time: mean = 5.045 us, total = 326.616 ms, Queueing time: mean = 86.827 us, max = 48.698 ms, min = 2.040 us, total = 5.622 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 32389 total (1 active), Execution time: mean = 16.311 us, total = 528.287 ms, Queueing time: mean = 66.175 us, max = 41.182 ms, min = -0.000 s, total = 2.143 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 25871 total (1 active), Execution time: mean = 432.564 us, total = 11.191 s, Queueing time: mean = 64.757 us, max = 13.366 ms, min = -0.000 s, total = 1.675 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6480 total (1 active), Execution time: mean = 8.346 us, total = 54.084 ms, Queueing time: mean = 172.092 us, max = 4.336 ms, min = -0.000 s, total = 1.115 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6480 total (1 active), Execution time: mean = 14.898 us, total = 96.540 ms, Queueing time: mean = 61.614 us, max = 3.804 ms, min = 7.553 us, total = 399.258 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6480 total (1 active), Execution time: mean = 3.170 us, total = 20.540 ms, Queueing time: mean = 175.493 us, max = 4.341 ms, min = 2.496 us, total = 1.137 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6478 total (0 active), Execution time: mean = 98.171 us, total = 635.953 ms, Queueing time: mean = 91.489 us, max = 2.573 ms, min = 2.559 us, total = 592.667 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6478 total (0 active), Execution time: mean = 549.516 us, total = 3.560 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2161 total (1 active), Execution time: mean = 8.049 us, total = 17.394 ms, Queueing time: mean = 67.088 us, max = 5.703 ms, min = 11.179 us, total = 144.978 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1296 total (0 active), Execution time: mean = 1.378 ms, total = 1.786 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1296 total (0 active), Execution time: mean = 50.005 us, total = 64.807 ms, Queueing time: mean = 92.357 us, max = 3.960 ms, min = 6.906 us, total = 119.694 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1296 total (1 active), Execution time: mean = 523.789 us, total = 678.831 ms, Queueing time: mean = 367.783 us, max = 2.197 ms, min = 6.917 us, total = 476.647 ms [state-dump] NodeManager.GcsCheckAlive - 1296 total (1 active), Execution time: mean = 297.452 us, total = 385.498 ms, Queueing time: mean = 593.775 us, max = 2.307 ms, min = 5.323 us, total = 769.533 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 648 total (1 active), Execution time: mean = 1.717 ms, total = 1.113 s, Queueing time: mean = 64.112 us, max = 1.632 ms, min = 11.175 us, total = 41.544 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 108 total (1 active, 1 running), Execution time: mean = 2.591 ms, total = 279.781 ms, Queueing time: mean = 61.637 us, max = 172.215 us, min = 13.784 us, total = 6.657 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 12 total (1 active), Execution time: mean = 499.897 s, total = 5998.759 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 11 total (0 active), Execution time: mean = 333.411 us, total = 3.668 ms, Queueing time: mean = 97.445 us, max = 410.175 us, min = 20.320 us, total = 1.072 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 6.515 us, total = 52.120 us, Queueing time: mean = 48.513 us, max = 79.050 us, min = 26.627 us, total = 388.107 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 07:01:16,228 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:01:16,503 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 570813 total (35 active) [state-dump] Queueing time: mean = 6.296 ms, max = 590.169 s, min = -0.000 s, total = 3593.799 s [state-dump] Execution time: mean = 10.665 ms, total = 6087.837 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 137298 total (0 active), Execution time: mean = 450.826 us, total = 61.897 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 137298 total (0 active), Execution time: mean = 31.082 us, total = 4.268 s, Queueing time: mean = 92.085 us, max = 23.460 ms, min = 1.508 us, total = 12.643 s [state-dump] RaySyncer.OnDemandBroadcasting - 65345 total (1 active), Execution time: mean = 9.467 us, total = 618.637 ms, Queueing time: mean = 81.579 us, max = 65.085 ms, min = -0.000 s, total = 5.331 s [state-dump] NodeManager.CheckGC - 65345 total (1 active), Execution time: mean = 3.957 us, total = 258.572 ms, Queueing time: mean = 86.286 us, max = 60.039 ms, min = 3.126 us, total = 5.638 s [state-dump] ObjectManager.UpdateAvailableMemory - 65344 total (0 active), Execution time: mean = 5.028 us, total = 328.572 ms, Queueing time: mean = 86.283 us, max = 48.698 ms, min = 2.040 us, total = 5.638 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 32689 total (1 active), Execution time: mean = 16.271 us, total = 531.892 ms, Queueing time: mean = 65.881 us, max = 41.182 ms, min = -0.000 s, total = 2.154 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 26110 total (1 active), Execution time: mean = 432.435 us, total = 11.291 s, Queueing time: mean = 65.537 us, max = 27.346 ms, min = -0.000 s, total = 1.711 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6540 total (1 active), Execution time: mean = 8.330 us, total = 54.476 ms, Queueing time: mean = 171.875 us, max = 4.336 ms, min = -0.000 s, total = 1.124 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6540 total (1 active), Execution time: mean = 14.870 us, total = 97.249 ms, Queueing time: mean = 61.346 us, max = 3.804 ms, min = 7.553 us, total = 401.206 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6540 total (1 active), Execution time: mean = 3.167 us, total = 20.711 ms, Queueing time: mean = 175.268 us, max = 4.341 ms, min = 2.496 us, total = 1.146 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6538 total (0 active), Execution time: mean = 98.114 us, total = 641.471 ms, Queueing time: mean = 91.004 us, max = 2.573 ms, min = 2.559 us, total = 594.984 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6538 total (0 active), Execution time: mean = 547.985 us, total = 3.583 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2181 total (1 active), Execution time: mean = 8.039 us, total = 17.532 ms, Queueing time: mean = 66.797 us, max = 5.703 ms, min = 11.179 us, total = 145.684 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1308 total (0 active), Execution time: mean = 1.375 ms, total = 1.799 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1308 total (0 active), Execution time: mean = 49.955 us, total = 65.342 ms, Queueing time: mean = 91.744 us, max = 3.960 ms, min = 6.906 us, total = 120.001 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1308 total (1 active), Execution time: mean = 523.583 us, total = 684.847 ms, Queueing time: mean = 367.558 us, max = 2.197 ms, min = 6.917 us, total = 480.766 ms [state-dump] NodeManager.GcsCheckAlive - 1308 total (1 active), Execution time: mean = 297.294 us, total = 388.861 ms, Queueing time: mean = 593.417 us, max = 2.307 ms, min = 5.323 us, total = 776.189 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 654 total (1 active), Execution time: mean = 1.717 ms, total = 1.123 s, Queueing time: mean = 63.854 us, max = 1.632 ms, min = 11.175 us, total = 41.761 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 109 total (1 active, 1 running), Execution time: mean = 2.593 ms, total = 282.636 ms, Queueing time: mean = 61.259 us, max = 172.215 us, min = 13.784 us, total = 6.677 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 12 total (1 active), Execution time: mean = 499.897 s, total = 5998.759 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 11 total (0 active), Execution time: mean = 333.411 us, total = 3.668 ms, Queueing time: mean = 97.445 us, max = 410.175 us, min = 20.320 us, total = 1.072 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 6.515 us, total = 52.120 us, Queueing time: mean = 48.513 us, max = 79.050 us, min = 26.627 us, total = 388.107 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 07:02:16,228 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:02:16,506 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 576047 total (35 active) [state-dump] Queueing time: mean = 6.239 ms, max = 590.169 s, min = -0.000 s, total = 3594.163 s [state-dump] Execution time: mean = 11.611 ms, total = 6688.650 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 138558 total (0 active), Execution time: mean = 450.801 us, total = 62.462 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 138558 total (0 active), Execution time: mean = 31.081 us, total = 4.307 s, Queueing time: mean = 92.101 us, max = 23.460 ms, min = 1.508 us, total = 12.761 s [state-dump] RaySyncer.OnDemandBroadcasting - 65944 total (1 active), Execution time: mean = 9.474 us, total = 624.773 ms, Queueing time: mean = 81.611 us, max = 65.085 ms, min = -0.000 s, total = 5.382 s [state-dump] NodeManager.CheckGC - 65944 total (1 active), Execution time: mean = 3.950 us, total = 260.464 ms, Queueing time: mean = 86.332 us, max = 60.039 ms, min = 3.126 us, total = 5.693 s [state-dump] ObjectManager.UpdateAvailableMemory - 65943 total (0 active), Execution time: mean = 5.030 us, total = 331.676 ms, Queueing time: mean = 86.324 us, max = 48.698 ms, min = 2.040 us, total = 5.692 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 32989 total (1 active), Execution time: mean = 16.278 us, total = 536.989 ms, Queueing time: mean = 65.881 us, max = 41.182 ms, min = -0.000 s, total = 2.173 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 26350 total (1 active), Execution time: mean = 432.520 us, total = 11.397 s, Queueing time: mean = 65.571 us, max = 27.346 ms, min = -0.000 s, total = 1.728 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6600 total (1 active), Execution time: mean = 8.334 us, total = 55.006 ms, Queueing time: mean = 172.046 us, max = 4.336 ms, min = -0.000 s, total = 1.136 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6600 total (1 active), Execution time: mean = 14.880 us, total = 98.205 ms, Queueing time: mean = 61.405 us, max = 3.804 ms, min = 7.553 us, total = 405.271 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6600 total (1 active), Execution time: mean = 3.167 us, total = 20.903 ms, Queueing time: mean = 175.441 us, max = 4.341 ms, min = 2.496 us, total = 1.158 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6598 total (0 active), Execution time: mean = 98.144 us, total = 647.554 ms, Queueing time: mean = 91.125 us, max = 2.573 ms, min = 2.559 us, total = 601.242 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6598 total (0 active), Execution time: mean = 548.540 us, total = 3.619 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2201 total (1 active), Execution time: mean = 8.051 us, total = 17.721 ms, Queueing time: mean = 66.836 us, max = 5.703 ms, min = 11.179 us, total = 147.107 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1320 total (0 active), Execution time: mean = 1.376 ms, total = 1.816 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1320 total (0 active), Execution time: mean = 49.962 us, total = 65.950 ms, Queueing time: mean = 91.788 us, max = 3.960 ms, min = 6.906 us, total = 121.161 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1320 total (1 active), Execution time: mean = 524.053 us, total = 691.749 ms, Queueing time: mean = 367.814 us, max = 2.197 ms, min = 6.917 us, total = 485.514 ms [state-dump] NodeManager.GcsCheckAlive - 1320 total (1 active), Execution time: mean = 297.388 us, total = 392.552 ms, Queueing time: mean = 594.099 us, max = 2.307 ms, min = 5.323 us, total = 784.210 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 660 total (1 active), Execution time: mean = 1.718 ms, total = 1.134 s, Queueing time: mean = 63.853 us, max = 1.632 ms, min = 11.175 us, total = 42.143 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 110 total (1 active, 1 running), Execution time: mean = 2.596 ms, total = 285.510 ms, Queueing time: mean = 60.999 us, max = 172.215 us, min = 13.784 us, total = 6.710 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 13 total (1 active), Execution time: mean = 507.597 s, total = 6598.759 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 12 total (0 active), Execution time: mean = 327.929 us, total = 3.935 ms, Queueing time: mean = 99.248 us, max = 410.175 us, min = 20.320 us, total = 1.191 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 6.515 us, total = 52.120 us, Queueing time: mean = 48.513 us, max = 79.050 us, min = 26.627 us, total = 388.107 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:03:16,228 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:03:16,509 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 581281 total (35 active) [state-dump] Queueing time: mean = 6.184 ms, max = 590.169 s, min = -0.000 s, total = 3594.560 s [state-dump] Execution time: mean = 11.508 ms, total = 6689.507 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 139818 total (0 active), Execution time: mean = 451.137 us, total = 63.077 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 139818 total (0 active), Execution time: mean = 31.087 us, total = 4.346 s, Queueing time: mean = 92.219 us, max = 23.460 ms, min = 1.508 us, total = 12.894 s [state-dump] RaySyncer.OnDemandBroadcasting - 66544 total (1 active), Execution time: mean = 9.479 us, total = 630.770 ms, Queueing time: mean = 81.732 us, max = 65.085 ms, min = -0.000 s, total = 5.439 s [state-dump] NodeManager.CheckGC - 66544 total (1 active), Execution time: mean = 3.944 us, total = 262.431 ms, Queueing time: mean = 86.463 us, max = 60.039 ms, min = 3.126 us, total = 5.754 s [state-dump] ObjectManager.UpdateAvailableMemory - 66543 total (0 active), Execution time: mean = 5.034 us, total = 334.946 ms, Queueing time: mean = 86.440 us, max = 48.698 ms, min = 2.040 us, total = 5.752 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 33289 total (1 active), Execution time: mean = 16.296 us, total = 542.490 ms, Queueing time: mean = 66.168 us, max = 41.182 ms, min = -0.000 s, total = 2.203 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 26589 total (1 active), Execution time: mean = 432.416 us, total = 11.498 s, Queueing time: mean = 65.633 us, max = 27.346 ms, min = -0.000 s, total = 1.745 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6660 total (1 active), Execution time: mean = 8.340 us, total = 55.544 ms, Queueing time: mean = 171.868 us, max = 4.336 ms, min = -0.000 s, total = 1.145 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6660 total (1 active), Execution time: mean = 14.890 us, total = 99.167 ms, Queueing time: mean = 61.405 us, max = 3.804 ms, min = 7.553 us, total = 408.957 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6660 total (1 active), Execution time: mean = 3.168 us, total = 21.102 ms, Queueing time: mean = 175.262 us, max = 4.341 ms, min = 2.496 us, total = 1.167 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6658 total (0 active), Execution time: mean = 98.150 us, total = 653.480 ms, Queueing time: mean = 91.109 us, max = 2.573 ms, min = 2.559 us, total = 606.602 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6658 total (0 active), Execution time: mean = 549.230 us, total = 3.657 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2221 total (1 active), Execution time: mean = 8.057 us, total = 17.894 ms, Queueing time: mean = 67.354 us, max = 5.703 ms, min = 11.179 us, total = 149.593 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1332 total (0 active), Execution time: mean = 1.376 ms, total = 1.833 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1332 total (0 active), Execution time: mean = 49.999 us, total = 66.599 ms, Queueing time: mean = 91.873 us, max = 3.960 ms, min = 6.906 us, total = 122.375 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1332 total (1 active), Execution time: mean = 523.948 us, total = 697.898 ms, Queueing time: mean = 366.987 us, max = 2.197 ms, min = 6.917 us, total = 488.827 ms [state-dump] NodeManager.GcsCheckAlive - 1332 total (1 active), Execution time: mean = 297.330 us, total = 396.043 ms, Queueing time: mean = 593.243 us, max = 2.307 ms, min = 5.323 us, total = 790.199 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 666 total (1 active), Execution time: mean = 1.717 ms, total = 1.143 s, Queueing time: mean = 63.866 us, max = 1.632 ms, min = 10.923 us, total = 42.534 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 111 total (1 active, 1 running), Execution time: mean = 2.598 ms, total = 288.377 ms, Queueing time: mean = 60.849 us, max = 172.215 us, min = 13.784 us, total = 6.754 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 13 total (1 active), Execution time: mean = 507.597 s, total = 6598.759 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 12 total (0 active), Execution time: mean = 327.929 us, total = 3.935 ms, Queueing time: mean = 99.248 us, max = 410.175 us, min = 20.320 us, total = 1.191 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 6.515 us, total = 52.120 us, Queueing time: mean = 48.513 us, max = 79.050 us, min = 26.627 us, total = 388.107 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:04:16,228 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:04:16,512 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 586513 total (35 active) [state-dump] Queueing time: mean = 6.129 ms, max = 590.169 s, min = -0.000 s, total = 3594.929 s [state-dump] Execution time: mean = 11.407 ms, total = 6690.328 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 141078 total (0 active), Execution time: mean = 451.186 us, total = 63.652 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 141078 total (0 active), Execution time: mean = 31.095 us, total = 4.387 s, Queueing time: mean = 92.296 us, max = 23.460 ms, min = 1.508 us, total = 13.021 s [state-dump] RaySyncer.OnDemandBroadcasting - 67143 total (1 active), Execution time: mean = 9.484 us, total = 636.775 ms, Queueing time: mean = 81.780 us, max = 65.085 ms, min = -0.000 s, total = 5.491 s [state-dump] NodeManager.CheckGC - 67143 total (1 active), Execution time: mean = 3.937 us, total = 264.355 ms, Queueing time: mean = 86.521 us, max = 60.039 ms, min = 3.126 us, total = 5.809 s [state-dump] ObjectManager.UpdateAvailableMemory - 67142 total (0 active), Execution time: mean = 5.036 us, total = 338.103 ms, Queueing time: mean = 86.450 us, max = 48.698 ms, min = 2.040 us, total = 5.804 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 33589 total (1 active), Execution time: mean = 16.305 us, total = 547.659 ms, Queueing time: mean = 66.177 us, max = 41.182 ms, min = -0.000 s, total = 2.223 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 26829 total (1 active), Execution time: mean = 432.470 us, total = 11.603 s, Queueing time: mean = 65.653 us, max = 27.346 ms, min = -0.000 s, total = 1.761 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6720 total (1 active), Execution time: mean = 8.341 us, total = 56.054 ms, Queueing time: mean = 171.897 us, max = 4.336 ms, min = -0.000 s, total = 1.155 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6720 total (1 active), Execution time: mean = 14.893 us, total = 100.081 ms, Queueing time: mean = 61.367 us, max = 3.804 ms, min = 7.553 us, total = 412.387 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6720 total (1 active), Execution time: mean = 3.168 us, total = 21.291 ms, Queueing time: mean = 175.290 us, max = 4.341 ms, min = 2.496 us, total = 1.178 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6718 total (0 active), Execution time: mean = 98.163 us, total = 659.458 ms, Queueing time: mean = 91.159 us, max = 2.573 ms, min = 2.559 us, total = 612.405 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6718 total (0 active), Execution time: mean = 549.540 us, total = 3.692 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2241 total (1 active), Execution time: mean = 8.067 us, total = 18.079 ms, Queueing time: mean = 67.369 us, max = 5.703 ms, min = 11.179 us, total = 150.975 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1344 total (0 active), Execution time: mean = 1.376 ms, total = 1.850 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1344 total (0 active), Execution time: mean = 50.020 us, total = 67.227 ms, Queueing time: mean = 91.780 us, max = 3.960 ms, min = 6.906 us, total = 123.352 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1344 total (1 active), Execution time: mean = 523.979 us, total = 704.227 ms, Queueing time: mean = 367.132 us, max = 2.197 ms, min = 6.917 us, total = 493.426 ms [state-dump] NodeManager.GcsCheckAlive - 1344 total (1 active), Execution time: mean = 297.258 us, total = 399.514 ms, Queueing time: mean = 593.446 us, max = 2.307 ms, min = 5.323 us, total = 797.591 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 672 total (1 active), Execution time: mean = 1.717 ms, total = 1.154 s, Queueing time: mean = 63.786 us, max = 1.632 ms, min = 10.923 us, total = 42.864 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 112 total (1 active, 1 running), Execution time: mean = 2.602 ms, total = 291.392 ms, Queueing time: mean = 60.918 us, max = 172.215 us, min = 13.784 us, total = 6.823 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 13 total (1 active), Execution time: mean = 507.597 s, total = 6598.759 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 12 total (0 active), Execution time: mean = 327.929 us, total = 3.935 ms, Queueing time: mean = 99.248 us, max = 410.175 us, min = 20.320 us, total = 1.191 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 6.515 us, total = 52.120 us, Queueing time: mean = 48.513 us, max = 79.050 us, min = 26.627 us, total = 388.107 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 07:05:16,229 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:05:16,515 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 591747 total (35 active) [state-dump] Queueing time: mean = 6.076 ms, max = 590.169 s, min = -0.000 s, total = 3595.338 s [state-dump] Execution time: mean = 11.308 ms, total = 6691.212 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 142338 total (0 active), Execution time: mean = 451.664 us, total = 64.289 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 142338 total (0 active), Execution time: mean = 31.086 us, total = 4.425 s, Queueing time: mean = 92.431 us, max = 23.460 ms, min = 1.508 us, total = 13.156 s [state-dump] RaySyncer.OnDemandBroadcasting - 67743 total (1 active), Execution time: mean = 9.491 us, total = 642.948 ms, Queueing time: mean = 81.889 us, max = 65.085 ms, min = -0.000 s, total = 5.547 s [state-dump] NodeManager.CheckGC - 67743 total (1 active), Execution time: mean = 3.930 us, total = 266.213 ms, Queueing time: mean = 86.643 us, max = 60.039 ms, min = 3.126 us, total = 5.869 s [state-dump] ObjectManager.UpdateAvailableMemory - 67742 total (0 active), Execution time: mean = 5.040 us, total = 341.434 ms, Queueing time: mean = 86.528 us, max = 48.698 ms, min = 2.040 us, total = 5.862 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 33889 total (1 active), Execution time: mean = 16.305 us, total = 552.554 ms, Queueing time: mean = 66.343 us, max = 41.182 ms, min = -0.000 s, total = 2.248 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 27068 total (1 active), Execution time: mean = 432.528 us, total = 11.708 s, Queueing time: mean = 65.745 us, max = 27.346 ms, min = -0.000 s, total = 1.780 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6780 total (1 active), Execution time: mean = 8.349 us, total = 56.606 ms, Queueing time: mean = 171.960 us, max = 4.336 ms, min = -0.000 s, total = 1.166 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6780 total (1 active), Execution time: mean = 14.900 us, total = 101.019 ms, Queueing time: mean = 61.814 us, max = 3.804 ms, min = 7.553 us, total = 419.096 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6780 total (1 active), Execution time: mean = 3.168 us, total = 21.478 ms, Queueing time: mean = 175.352 us, max = 4.341 ms, min = 2.496 us, total = 1.189 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6778 total (0 active), Execution time: mean = 98.194 us, total = 665.561 ms, Queueing time: mean = 91.247 us, max = 2.573 ms, min = 2.559 us, total = 618.472 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6778 total (0 active), Execution time: mean = 550.154 us, total = 3.729 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2261 total (1 active), Execution time: mean = 8.065 us, total = 18.235 ms, Queueing time: mean = 70.183 us, max = 6.635 ms, min = 11.179 us, total = 158.684 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1356 total (0 active), Execution time: mean = 1.377 ms, total = 1.868 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1356 total (0 active), Execution time: mean = 50.060 us, total = 67.881 ms, Queueing time: mean = 91.956 us, max = 3.960 ms, min = 6.906 us, total = 124.692 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1356 total (1 active), Execution time: mean = 524.407 us, total = 711.096 ms, Queueing time: mean = 367.086 us, max = 2.197 ms, min = 6.917 us, total = 497.768 ms [state-dump] NodeManager.GcsCheckAlive - 1356 total (1 active), Execution time: mean = 297.435 us, total = 403.321 ms, Queueing time: mean = 593.695 us, max = 2.307 ms, min = 5.323 us, total = 805.050 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 678 total (1 active), Execution time: mean = 1.717 ms, total = 1.164 s, Queueing time: mean = 63.787 us, max = 1.632 ms, min = 10.923 us, total = 43.248 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 113 total (1 active, 1 running), Execution time: mean = 2.607 ms, total = 294.542 ms, Queueing time: mean = 60.964 us, max = 172.215 us, min = 13.784 us, total = 6.889 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 13 total (1 active), Execution time: mean = 507.597 s, total = 6598.759 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 12 total (0 active), Execution time: mean = 327.929 us, total = 3.935 ms, Queueing time: mean = 99.248 us, max = 410.175 us, min = 20.320 us, total = 1.191 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 6.515 us, total = 52.120 us, Queueing time: mean = 48.513 us, max = 79.050 us, min = 26.627 us, total = 388.107 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 07:06:16,229 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:06:16,518 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 596978 total (35 active) [state-dump] Queueing time: mean = 6.023 ms, max = 590.169 s, min = -0.000 s, total = 3595.543 s [state-dump] Execution time: mean = 11.209 ms, total = 6691.768 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 143598 total (0 active), Execution time: mean = 450.160 us, total = 64.642 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 143598 total (0 active), Execution time: mean = 30.986 us, total = 4.450 s, Queueing time: mean = 91.988 us, max = 23.460 ms, min = 1.508 us, total = 13.209 s [state-dump] RaySyncer.OnDemandBroadcasting - 68342 total (1 active), Execution time: mean = 9.480 us, total = 647.870 ms, Queueing time: mean = 81.684 us, max = 65.085 ms, min = -0.000 s, total = 5.582 s [state-dump] NodeManager.CheckGC - 68342 total (1 active), Execution time: mean = 3.922 us, total = 268.004 ms, Queueing time: mean = 86.437 us, max = 60.039 ms, min = 3.126 us, total = 5.907 s [state-dump] ObjectManager.UpdateAvailableMemory - 68341 total (0 active), Execution time: mean = 5.028 us, total = 343.650 ms, Queueing time: mean = 86.095 us, max = 48.698 ms, min = 2.040 us, total = 5.884 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 34188 total (1 active), Execution time: mean = 16.273 us, total = 556.332 ms, Queueing time: mean = 66.116 us, max = 41.182 ms, min = -0.000 s, total = 2.260 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 27308 total (1 active), Execution time: mean = 432.350 us, total = 11.807 s, Queueing time: mean = 65.533 us, max = 27.346 ms, min = -0.000 s, total = 1.790 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6840 total (1 active), Execution time: mean = 8.335 us, total = 57.009 ms, Queueing time: mean = 171.813 us, max = 4.336 ms, min = -0.000 s, total = 1.175 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6840 total (1 active), Execution time: mean = 14.879 us, total = 101.769 ms, Queueing time: mean = 61.602 us, max = 3.804 ms, min = 7.553 us, total = 421.355 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6840 total (1 active), Execution time: mean = 3.168 us, total = 21.672 ms, Queueing time: mean = 175.196 us, max = 4.341 ms, min = 2.496 us, total = 1.198 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6838 total (0 active), Execution time: mean = 98.139 us, total = 671.078 ms, Queueing time: mean = 90.832 us, max = 2.573 ms, min = 2.559 us, total = 621.110 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6838 total (0 active), Execution time: mean = 548.902 us, total = 3.753 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2281 total (1 active), Execution time: mean = 8.058 us, total = 18.381 ms, Queueing time: mean = 69.939 us, max = 6.635 ms, min = 11.179 us, total = 159.530 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1368 total (0 active), Execution time: mean = 1.375 ms, total = 1.881 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1368 total (0 active), Execution time: mean = 49.990 us, total = 68.386 ms, Queueing time: mean = 91.659 us, max = 3.960 ms, min = 6.906 us, total = 125.390 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1368 total (1 active), Execution time: mean = 524.248 us, total = 717.171 ms, Queueing time: mean = 366.528 us, max = 2.197 ms, min = 6.917 us, total = 501.410 ms [state-dump] NodeManager.GcsCheckAlive - 1368 total (1 active), Execution time: mean = 297.189 us, total = 406.554 ms, Queueing time: mean = 593.172 us, max = 2.307 ms, min = 5.323 us, total = 811.459 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 684 total (1 active), Execution time: mean = 1.717 ms, total = 1.174 s, Queueing time: mean = 63.436 us, max = 1.632 ms, min = 10.923 us, total = 43.390 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 114 total (1 active, 1 running), Execution time: mean = 2.609 ms, total = 297.437 ms, Queueing time: mean = 60.905 us, max = 172.215 us, min = 13.784 us, total = 6.943 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 13 total (1 active), Execution time: mean = 507.597 s, total = 6598.759 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 12 total (0 active), Execution time: mean = 327.929 us, total = 3.935 ms, Queueing time: mean = 99.248 us, max = 410.175 us, min = 20.320 us, total = 1.191 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 6.515 us, total = 52.120 us, Queueing time: mean = 48.513 us, max = 79.050 us, min = 26.627 us, total = 388.107 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:07:16,229 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:07:16,521 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 602212 total (35 active) [state-dump] Queueing time: mean = 5.971 ms, max = 590.169 s, min = -0.000 s, total = 3595.693 s [state-dump] Execution time: mean = 11.113 ms, total = 6692.236 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 144858 total (0 active), Execution time: mean = 448.191 us, total = 64.924 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 144858 total (0 active), Execution time: mean = 30.854 us, total = 4.469 s, Queueing time: mean = 91.395 us, max = 23.460 ms, min = 1.508 us, total = 13.239 s [state-dump] RaySyncer.OnDemandBroadcasting - 68942 total (1 active), Execution time: mean = 9.452 us, total = 651.642 ms, Queueing time: mean = 81.343 us, max = 65.085 ms, min = -0.000 s, total = 5.608 s [state-dump] NodeManager.CheckGC - 68942 total (1 active), Execution time: mean = 3.911 us, total = 269.606 ms, Queueing time: mean = 86.081 us, max = 60.039 ms, min = 3.126 us, total = 5.935 s [state-dump] ObjectManager.UpdateAvailableMemory - 68941 total (0 active), Execution time: mean = 5.008 us, total = 345.267 ms, Queueing time: mean = 85.502 us, max = 48.698 ms, min = 2.040 us, total = 5.895 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 34488 total (1 active), Execution time: mean = 16.227 us, total = 559.624 ms, Queueing time: mean = 65.832 us, max = 41.182 ms, min = -0.000 s, total = 2.270 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 27547 total (1 active), Execution time: mean = 432.056 us, total = 11.902 s, Queueing time: mean = 65.230 us, max = 27.346 ms, min = -0.000 s, total = 1.797 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6900 total (1 active), Execution time: mean = 8.321 us, total = 57.413 ms, Queueing time: mean = 171.853 us, max = 4.336 ms, min = -0.000 s, total = 1.186 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6900 total (1 active), Execution time: mean = 14.856 us, total = 102.505 ms, Queueing time: mean = 61.356 us, max = 3.804 ms, min = 7.553 us, total = 423.358 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6900 total (1 active), Execution time: mean = 3.166 us, total = 21.846 ms, Queueing time: mean = 175.228 us, max = 4.341 ms, min = 2.496 us, total = 1.209 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6898 total (0 active), Execution time: mean = 98.036 us, total = 676.251 ms, Queueing time: mean = 90.239 us, max = 2.573 ms, min = 2.559 us, total = 622.469 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6898 total (0 active), Execution time: mean = 546.989 us, total = 3.773 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2301 total (1 active), Execution time: mean = 8.041 us, total = 18.502 ms, Queueing time: mean = 69.591 us, max = 6.635 ms, min = 11.179 us, total = 160.129 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1380 total (0 active), Execution time: mean = 1.371 ms, total = 1.892 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1380 total (0 active), Execution time: mean = 49.915 us, total = 68.883 ms, Queueing time: mean = 91.128 us, max = 3.960 ms, min = 6.906 us, total = 125.756 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1380 total (1 active), Execution time: mean = 524.064 us, total = 723.208 ms, Queueing time: mean = 366.868 us, max = 2.197 ms, min = 6.917 us, total = 506.278 ms [state-dump] NodeManager.GcsCheckAlive - 1380 total (1 active), Execution time: mean = 296.896 us, total = 409.716 ms, Queueing time: mean = 593.638 us, max = 2.307 ms, min = 5.323 us, total = 819.220 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 690 total (1 active), Execution time: mean = 1.717 ms, total = 1.185 s, Queueing time: mean = 63.150 us, max = 1.632 ms, min = 10.923 us, total = 43.573 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 115 total (1 active, 1 running), Execution time: mean = 2.609 ms, total = 300.013 ms, Queueing time: mean = 60.584 us, max = 172.215 us, min = 13.784 us, total = 6.967 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 13 total (1 active), Execution time: mean = 507.597 s, total = 6598.759 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 12 total (0 active), Execution time: mean = 327.929 us, total = 3.935 ms, Queueing time: mean = 99.248 us, max = 410.175 us, min = 20.320 us, total = 1.191 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 6.515 us, total = 52.120 us, Queueing time: mean = 48.513 us, max = 79.050 us, min = 26.627 us, total = 388.107 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:08:16,229 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:08:16,524 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 607447 total (35 active) [state-dump] Queueing time: mean = 5.920 ms, max = 590.169 s, min = -0.000 s, total = 3595.844 s [state-dump] Execution time: mean = 11.018 ms, total = 6692.724 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 146118 total (0 active), Execution time: mean = 446.359 us, total = 65.221 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 146118 total (0 active), Execution time: mean = 30.731 us, total = 4.490 s, Queueing time: mean = 90.848 us, max = 23.460 ms, min = 1.508 us, total = 13.274 s [state-dump] RaySyncer.OnDemandBroadcasting - 69542 total (1 active), Execution time: mean = 9.428 us, total = 655.622 ms, Queueing time: mean = 81.004 us, max = 65.085 ms, min = -0.000 s, total = 5.633 s [state-dump] NodeManager.CheckGC - 69542 total (1 active), Execution time: mean = 3.900 us, total = 271.227 ms, Queueing time: mean = 85.730 us, max = 60.039 ms, min = 3.126 us, total = 5.962 s [state-dump] ObjectManager.UpdateAvailableMemory - 69541 total (0 active), Execution time: mean = 4.989 us, total = 346.912 ms, Queueing time: mean = 84.904 us, max = 48.698 ms, min = 2.040 us, total = 5.904 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 34788 total (1 active), Execution time: mean = 16.183 us, total = 562.989 ms, Queueing time: mean = 65.552 us, max = 41.182 ms, min = -0.000 s, total = 2.280 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 27787 total (1 active), Execution time: mean = 431.879 us, total = 12.001 s, Queueing time: mean = 64.920 us, max = 27.346 ms, min = -0.000 s, total = 1.804 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6960 total (1 active), Execution time: mean = 8.308 us, total = 57.826 ms, Queueing time: mean = 171.863 us, max = 4.336 ms, min = -0.000 s, total = 1.196 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6960 total (1 active), Execution time: mean = 14.828 us, total = 103.200 ms, Queueing time: mean = 61.094 us, max = 3.804 ms, min = 7.553 us, total = 425.211 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6960 total (1 active), Execution time: mean = 3.164 us, total = 22.020 ms, Queueing time: mean = 175.231 us, max = 4.341 ms, min = 2.496 us, total = 1.220 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6958 total (0 active), Execution time: mean = 97.953 us, total = 681.559 ms, Queueing time: mean = 89.700 us, max = 2.573 ms, min = 2.559 us, total = 624.135 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6958 total (0 active), Execution time: mean = 545.151 us, total = 3.793 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2321 total (1 active), Execution time: mean = 8.018 us, total = 18.609 ms, Queueing time: mean = 69.238 us, max = 6.635 ms, min = 11.179 us, total = 160.702 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1392 total (0 active), Execution time: mean = 1.367 ms, total = 1.903 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1392 total (0 active), Execution time: mean = 49.796 us, total = 69.316 ms, Queueing time: mean = 90.528 us, max = 3.960 ms, min = 6.906 us, total = 126.015 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1392 total (1 active), Execution time: mean = 524.092 us, total = 729.536 ms, Queueing time: mean = 366.843 us, max = 2.197 ms, min = 6.917 us, total = 510.645 ms [state-dump] NodeManager.GcsCheckAlive - 1392 total (1 active), Execution time: mean = 296.619 us, total = 412.894 ms, Queueing time: mean = 593.927 us, max = 2.307 ms, min = 5.323 us, total = 826.746 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 696 total (1 active), Execution time: mean = 1.717 ms, total = 1.195 s, Queueing time: mean = 62.829 us, max = 1.632 ms, min = 10.923 us, total = 43.729 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 116 total (1 active, 1 running), Execution time: mean = 2.611 ms, total = 302.891 ms, Queueing time: mean = 60.359 us, max = 172.215 us, min = 13.784 us, total = 7.002 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 13 total (1 active), Execution time: mean = 507.597 s, total = 6598.759 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 12 total (0 active), Execution time: mean = 327.929 us, total = 3.935 ms, Queueing time: mean = 99.248 us, max = 410.175 us, min = 20.320 us, total = 1.191 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 6.515 us, total = 52.120 us, Queueing time: mean = 48.513 us, max = 79.050 us, min = 26.627 us, total = 388.107 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:09:16,229 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:09:16,527 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 612681 total (35 active) [state-dump] Queueing time: mean = 5.869 ms, max = 590.169 s, min = -0.000 s, total = 3596.002 s [state-dump] Execution time: mean = 10.924 ms, total = 6693.205 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 147378 total (0 active), Execution time: mean = 444.506 us, total = 65.510 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 147378 total (0 active), Execution time: mean = 30.609 us, total = 4.511 s, Queueing time: mean = 90.306 us, max = 23.460 ms, min = 1.508 us, total = 13.309 s [state-dump] RaySyncer.OnDemandBroadcasting - 70142 total (1 active), Execution time: mean = 9.404 us, total = 659.632 ms, Queueing time: mean = 80.709 us, max = 65.085 ms, min = -0.000 s, total = 5.661 s [state-dump] NodeManager.CheckGC - 70142 total (1 active), Execution time: mean = 3.890 us, total = 272.852 ms, Queueing time: mean = 85.423 us, max = 60.039 ms, min = 3.126 us, total = 5.992 s [state-dump] ObjectManager.UpdateAvailableMemory - 70141 total (0 active), Execution time: mean = 4.969 us, total = 348.528 ms, Queueing time: mean = 84.336 us, max = 48.698 ms, min = 2.040 us, total = 5.915 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 35088 total (1 active), Execution time: mean = 16.138 us, total = 566.242 ms, Queueing time: mean = 65.283 us, max = 41.182 ms, min = -0.000 s, total = 2.291 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 28026 total (1 active), Execution time: mean = 431.669 us, total = 12.098 s, Queueing time: mean = 64.646 us, max = 27.346 ms, min = -0.000 s, total = 1.812 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 7020 total (1 active), Execution time: mean = 8.291 us, total = 58.205 ms, Queueing time: mean = 171.852 us, max = 4.336 ms, min = -0.000 s, total = 1.206 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7020 total (1 active), Execution time: mean = 14.788 us, total = 103.810 ms, Queueing time: mean = 60.816 us, max = 3.804 ms, min = 7.553 us, total = 426.926 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7020 total (1 active), Execution time: mean = 3.161 us, total = 22.192 ms, Queueing time: mean = 175.210 us, max = 4.341 ms, min = 2.496 us, total = 1.230 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7018 total (0 active), Execution time: mean = 97.851 us, total = 686.715 ms, Queueing time: mean = 89.160 us, max = 2.573 ms, min = 2.559 us, total = 625.728 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7018 total (0 active), Execution time: mean = 543.306 us, total = 3.813 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2341 total (1 active), Execution time: mean = 7.997 us, total = 18.720 ms, Queueing time: mean = 68.893 us, max = 6.635 ms, min = 11.179 us, total = 161.278 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1404 total (0 active), Execution time: mean = 1.365 ms, total = 1.916 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1404 total (0 active), Execution time: mean = 49.698 us, total = 69.776 ms, Queueing time: mean = 90.018 us, max = 3.960 ms, min = 6.906 us, total = 126.385 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1404 total (1 active), Execution time: mean = 524.230 us, total = 736.019 ms, Queueing time: mean = 366.554 us, max = 2.197 ms, min = 6.917 us, total = 514.642 ms [state-dump] NodeManager.GcsCheckAlive - 1404 total (1 active), Execution time: mean = 296.311 us, total = 416.020 ms, Queueing time: mean = 594.087 us, max = 2.307 ms, min = 5.323 us, total = 834.098 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 702 total (1 active), Execution time: mean = 1.718 ms, total = 1.206 s, Queueing time: mean = 62.563 us, max = 1.632 ms, min = 10.923 us, total = 43.919 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 117 total (1 active, 1 running), Execution time: mean = 2.614 ms, total = 305.803 ms, Queueing time: mean = 60.079 us, max = 172.215 us, min = 13.784 us, total = 7.029 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 13 total (1 active), Execution time: mean = 507.597 s, total = 6598.759 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 12 total (0 active), Execution time: mean = 327.929 us, total = 3.935 ms, Queueing time: mean = 99.248 us, max = 410.175 us, min = 20.320 us, total = 1.191 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 6.515 us, total = 52.120 us, Queueing time: mean = 48.513 us, max = 79.050 us, min = 26.627 us, total = 388.107 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:10:16,229 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:10:16,530 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 617913 total (35 active) [state-dump] Queueing time: mean = 5.820 ms, max = 590.169 s, min = -0.000 s, total = 3596.283 s [state-dump] Execution time: mean = 10.833 ms, total = 6693.900 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 148638 total (0 active), Execution time: mean = 443.977 us, total = 65.992 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 148638 total (0 active), Execution time: mean = 30.549 us, total = 4.541 s, Queueing time: mean = 90.164 us, max = 23.460 ms, min = 1.508 us, total = 13.402 s [state-dump] RaySyncer.OnDemandBroadcasting - 70741 total (1 active), Execution time: mean = 9.398 us, total = 664.841 ms, Queueing time: mean = 80.601 us, max = 65.085 ms, min = -0.000 s, total = 5.702 s [state-dump] NodeManager.CheckGC - 70741 total (1 active), Execution time: mean = 3.883 us, total = 274.662 ms, Queueing time: mean = 85.316 us, max = 60.039 ms, min = 3.126 us, total = 6.035 s [state-dump] ObjectManager.UpdateAvailableMemory - 70740 total (0 active), Execution time: mean = 4.961 us, total = 350.962 ms, Queueing time: mean = 84.091 us, max = 48.698 ms, min = 2.040 us, total = 5.949 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 35388 total (1 active), Execution time: mean = 16.118 us, total = 570.386 ms, Queueing time: mean = 65.184 us, max = 41.182 ms, min = -0.000 s, total = 2.307 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 28266 total (1 active), Execution time: mean = 431.543 us, total = 12.198 s, Queueing time: mean = 64.581 us, max = 27.346 ms, min = -0.000 s, total = 1.825 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 7080 total (1 active), Execution time: mean = 8.283 us, total = 58.646 ms, Queueing time: mean = 171.773 us, max = 4.336 ms, min = -0.000 s, total = 1.216 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7080 total (1 active), Execution time: mean = 14.786 us, total = 104.684 ms, Queueing time: mean = 60.729 us, max = 3.804 ms, min = 7.553 us, total = 429.963 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7080 total (1 active), Execution time: mean = 3.160 us, total = 22.370 ms, Queueing time: mean = 175.125 us, max = 4.341 ms, min = 2.496 us, total = 1.240 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7078 total (0 active), Execution time: mean = 97.792 us, total = 692.168 ms, Queueing time: mean = 89.028 us, max = 2.573 ms, min = 2.559 us, total = 630.143 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7078 total (0 active), Execution time: mean = 542.826 us, total = 3.842 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2361 total (1 active), Execution time: mean = 7.984 us, total = 18.850 ms, Queueing time: mean = 68.696 us, max = 6.635 ms, min = 11.179 us, total = 162.192 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1416 total (0 active), Execution time: mean = 1.362 ms, total = 1.928 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1416 total (0 active), Execution time: mean = 49.617 us, total = 70.257 ms, Queueing time: mean = 89.706 us, max = 3.960 ms, min = 6.906 us, total = 127.024 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1416 total (1 active), Execution time: mean = 523.740 us, total = 741.616 ms, Queueing time: mean = 366.537 us, max = 2.197 ms, min = 6.917 us, total = 519.016 ms [state-dump] NodeManager.GcsCheckAlive - 1416 total (1 active), Execution time: mean = 295.965 us, total = 419.086 ms, Queueing time: mean = 593.928 us, max = 2.307 ms, min = 5.323 us, total = 841.002 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 708 total (1 active), Execution time: mean = 1.717 ms, total = 1.215 s, Queueing time: mean = 62.322 us, max = 1.632 ms, min = 10.923 us, total = 44.124 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 118 total (1 active, 1 running), Execution time: mean = 2.617 ms, total = 308.806 ms, Queueing time: mean = 59.708 us, max = 172.215 us, min = 13.784 us, total = 7.046 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 13 total (1 active), Execution time: mean = 507.597 s, total = 6598.759 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 12 total (0 active), Execution time: mean = 327.929 us, total = 3.935 ms, Queueing time: mean = 99.248 us, max = 410.175 us, min = 20.320 us, total = 1.191 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 6.515 us, total = 52.120 us, Queueing time: mean = 48.513 us, max = 79.050 us, min = 26.627 us, total = 388.107 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:11:16,230 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:11:16,533 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 623148 total (35 active) [state-dump] Queueing time: mean = 5.772 ms, max = 590.169 s, min = -0.000 s, total = 3596.704 s [state-dump] Execution time: mean = 10.744 ms, total = 6694.826 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 149898 total (0 active), Execution time: mean = 444.713 us, total = 66.662 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 149898 total (0 active), Execution time: mean = 30.584 us, total = 4.584 s, Queueing time: mean = 90.452 us, max = 23.460 ms, min = 1.508 us, total = 13.559 s [state-dump] RaySyncer.OnDemandBroadcasting - 71341 total (1 active), Execution time: mean = 9.405 us, total = 670.962 ms, Queueing time: mean = 80.674 us, max = 65.085 ms, min = -0.000 s, total = 5.755 s [state-dump] NodeManager.CheckGC - 71341 total (1 active), Execution time: mean = 3.878 us, total = 276.630 ms, Queueing time: mean = 85.400 us, max = 60.039 ms, min = 3.126 us, total = 6.093 s [state-dump] ObjectManager.UpdateAvailableMemory - 71340 total (0 active), Execution time: mean = 4.967 us, total = 354.360 ms, Queueing time: mean = 84.261 us, max = 48.698 ms, min = 2.040 us, total = 6.011 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 35688 total (1 active), Execution time: mean = 16.120 us, total = 575.302 ms, Queueing time: mean = 65.253 us, max = 41.182 ms, min = -0.000 s, total = 2.329 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 28506 total (1 active), Execution time: mean = 431.587 us, total = 12.303 s, Queueing time: mean = 64.688 us, max = 27.346 ms, min = -0.000 s, total = 1.844 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 7140 total (1 active), Execution time: mean = 8.294 us, total = 59.217 ms, Queueing time: mean = 171.952 us, max = 4.336 ms, min = -0.000 s, total = 1.228 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7140 total (1 active), Execution time: mean = 14.800 us, total = 105.670 ms, Queueing time: mean = 60.760 us, max = 3.804 ms, min = 7.553 us, total = 433.830 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7140 total (1 active), Execution time: mean = 3.162 us, total = 22.576 ms, Queueing time: mean = 175.309 us, max = 4.341 ms, min = 2.496 us, total = 1.252 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7138 total (0 active), Execution time: mean = 97.872 us, total = 698.614 ms, Queueing time: mean = 89.288 us, max = 2.573 ms, min = 2.559 us, total = 637.335 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7138 total (0 active), Execution time: mean = 543.740 us, total = 3.881 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2381 total (1 active), Execution time: mean = 7.990 us, total = 19.024 ms, Queueing time: mean = 68.701 us, max = 6.635 ms, min = 11.179 us, total = 163.577 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1428 total (0 active), Execution time: mean = 1.363 ms, total = 1.947 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1428 total (0 active), Execution time: mean = 49.663 us, total = 70.919 ms, Queueing time: mean = 89.829 us, max = 3.960 ms, min = 6.906 us, total = 128.276 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1428 total (1 active), Execution time: mean = 524.098 us, total = 748.413 ms, Queueing time: mean = 366.944 us, max = 2.197 ms, min = 6.917 us, total = 523.997 ms [state-dump] NodeManager.GcsCheckAlive - 1428 total (1 active), Execution time: mean = 296.000 us, total = 422.689 ms, Queueing time: mean = 594.692 us, max = 2.307 ms, min = 5.323 us, total = 849.220 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 714 total (1 active), Execution time: mean = 1.718 ms, total = 1.227 s, Queueing time: mean = 62.400 us, max = 1.632 ms, min = 10.923 us, total = 44.554 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 119 total (1 active, 1 running), Execution time: mean = 2.620 ms, total = 311.745 ms, Queueing time: mean = 59.680 us, max = 172.215 us, min = 13.784 us, total = 7.102 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 13 total (1 active), Execution time: mean = 507.597 s, total = 6598.759 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 12 total (0 active), Execution time: mean = 327.929 us, total = 3.935 ms, Queueing time: mean = 99.248 us, max = 410.175 us, min = 20.320 us, total = 1.191 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 6.515 us, total = 52.120 us, Queueing time: mean = 48.513 us, max = 79.050 us, min = 26.627 us, total = 388.107 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:12:16,230 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:12:16,536 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 628382 total (35 active) [state-dump] Queueing time: mean = 5.724 ms, max = 590.169 s, min = -0.000 s, total = 3597.062 s [state-dump] Execution time: mean = 11.610 ms, total = 7295.673 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 151158 total (0 active), Execution time: mean = 445.055 us, total = 67.274 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 151158 total (0 active), Execution time: mean = 30.575 us, total = 4.622 s, Queueing time: mean = 90.596 us, max = 23.460 ms, min = 1.508 us, total = 13.694 s [state-dump] RaySyncer.OnDemandBroadcasting - 71940 total (1 active), Execution time: mean = 9.402 us, total = 676.378 ms, Queueing time: mean = 80.646 us, max = 65.085 ms, min = -0.000 s, total = 5.802 s [state-dump] NodeManager.CheckGC - 71940 total (1 active), Execution time: mean = 3.871 us, total = 278.490 ms, Queueing time: mean = 85.375 us, max = 60.039 ms, min = 3.126 us, total = 6.142 s [state-dump] ObjectManager.UpdateAvailableMemory - 71939 total (0 active), Execution time: mean = 4.966 us, total = 357.253 ms, Queueing time: mean = 84.275 us, max = 48.698 ms, min = 2.040 us, total = 6.063 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 35988 total (1 active), Execution time: mean = 16.115 us, total = 579.940 ms, Queueing time: mean = 65.254 us, max = 41.182 ms, min = -0.000 s, total = 2.348 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 28745 total (1 active), Execution time: mean = 431.534 us, total = 12.404 s, Queueing time: mean = 64.649 us, max = 27.346 ms, min = -0.000 s, total = 1.858 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 7200 total (1 active), Execution time: mean = 8.293 us, total = 59.712 ms, Queueing time: mean = 171.800 us, max = 4.336 ms, min = -0.000 s, total = 1.237 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7200 total (1 active), Execution time: mean = 14.806 us, total = 106.602 ms, Queueing time: mean = 60.753 us, max = 3.804 ms, min = 7.553 us, total = 437.423 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7200 total (1 active), Execution time: mean = 3.161 us, total = 22.759 ms, Queueing time: mean = 175.158 us, max = 4.341 ms, min = 2.496 us, total = 1.261 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7198 total (0 active), Execution time: mean = 97.869 us, total = 704.461 ms, Queueing time: mean = 89.362 us, max = 2.573 ms, min = 2.559 us, total = 643.231 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7198 total (0 active), Execution time: mean = 544.049 us, total = 3.916 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2401 total (1 active), Execution time: mean = 7.986 us, total = 19.174 ms, Queueing time: mean = 68.559 us, max = 6.635 ms, min = 11.179 us, total = 164.611 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1440 total (0 active), Execution time: mean = 1.363 ms, total = 1.962 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1440 total (0 active), Execution time: mean = 49.646 us, total = 71.490 ms, Queueing time: mean = 89.733 us, max = 3.960 ms, min = 6.906 us, total = 129.216 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1440 total (1 active), Execution time: mean = 523.847 us, total = 754.339 ms, Queueing time: mean = 366.612 us, max = 2.197 ms, min = 6.917 us, total = 527.922 ms [state-dump] NodeManager.GcsCheckAlive - 1440 total (1 active), Execution time: mean = 295.751 us, total = 425.881 ms, Queueing time: mean = 594.344 us, max = 2.307 ms, min = 5.323 us, total = 855.855 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 720 total (1 active), Execution time: mean = 1.717 ms, total = 1.236 s, Queueing time: mean = 62.330 us, max = 1.632 ms, min = 10.033 us, total = 44.877 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 120 total (1 active, 1 running), Execution time: mean = 2.621 ms, total = 314.489 ms, Queueing time: mean = 59.656 us, max = 172.215 us, min = 13.784 us, total = 7.159 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 14 total (1 active), Execution time: mean = 514.197 s, total = 7198.760 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 13 total (0 active), Execution time: mean = 326.409 us, total = 4.243 ms, Queueing time: mean = 94.875 us, max = 410.175 us, min = 20.320 us, total = 1.233 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 6.568 us, total = 59.110 us, Queueing time: mean = 49.409 us, max = 79.050 us, min = 26.627 us, total = 444.680 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:13:16,230 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:13:16,539 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 633613 total (35 active) [state-dump] Queueing time: mean = 5.678 ms, max = 590.169 s, min = -0.000 s, total = 3597.554 s [state-dump] Execution time: mean = 11.516 ms, total = 7296.613 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 152418 total (0 active), Execution time: mean = 445.581 us, total = 67.915 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 152418 total (0 active), Execution time: mean = 30.597 us, total = 4.664 s, Queueing time: mean = 90.837 us, max = 23.460 ms, min = 1.508 us, total = 13.845 s [state-dump] RaySyncer.OnDemandBroadcasting - 72539 total (1 active), Execution time: mean = 9.401 us, total = 681.921 ms, Queueing time: mean = 81.255 us, max = 65.085 ms, min = -0.000 s, total = 5.894 s [state-dump] NodeManager.CheckGC - 72539 total (1 active), Execution time: mean = 3.865 us, total = 280.359 ms, Queueing time: mean = 85.989 us, max = 60.039 ms, min = 3.126 us, total = 6.238 s [state-dump] ObjectManager.UpdateAvailableMemory - 72538 total (0 active), Execution time: mean = 4.968 us, total = 360.359 ms, Queueing time: mean = 84.435 us, max = 48.698 ms, min = 2.040 us, total = 6.125 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 36288 total (1 active), Execution time: mean = 16.118 us, total = 584.872 ms, Queueing time: mean = 65.394 us, max = 41.182 ms, min = -0.000 s, total = 2.373 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 28984 total (1 active), Execution time: mean = 433.319 us, total = 12.559 s, Queueing time: mean = 64.718 us, max = 27.346 ms, min = -0.000 s, total = 1.876 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 7260 total (1 active), Execution time: mean = 8.293 us, total = 60.208 ms, Queueing time: mean = 171.910 us, max = 4.336 ms, min = -0.000 s, total = 1.248 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7260 total (1 active), Execution time: mean = 14.813 us, total = 107.545 ms, Queueing time: mean = 60.853 us, max = 3.804 ms, min = 7.553 us, total = 441.791 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7260 total (1 active), Execution time: mean = 3.161 us, total = 22.947 ms, Queueing time: mean = 175.267 us, max = 4.341 ms, min = 2.496 us, total = 1.272 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7258 total (0 active), Execution time: mean = 97.903 us, total = 710.582 ms, Queueing time: mean = 89.533 us, max = 2.573 ms, min = 2.559 us, total = 649.831 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7258 total (0 active), Execution time: mean = 544.772 us, total = 3.954 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2421 total (1 active), Execution time: mean = 7.989 us, total = 19.341 ms, Queueing time: mean = 68.552 us, max = 6.635 ms, min = 11.179 us, total = 165.965 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1452 total (0 active), Execution time: mean = 1.363 ms, total = 1.979 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1452 total (0 active), Execution time: mean = 49.656 us, total = 72.101 ms, Queueing time: mean = 89.786 us, max = 3.960 ms, min = 6.906 us, total = 130.369 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1452 total (1 active), Execution time: mean = 523.907 us, total = 760.713 ms, Queueing time: mean = 366.985 us, max = 2.197 ms, min = 6.917 us, total = 532.862 ms [state-dump] NodeManager.GcsCheckAlive - 1452 total (1 active), Execution time: mean = 295.700 us, total = 429.356 ms, Queueing time: mean = 594.868 us, max = 2.307 ms, min = 5.323 us, total = 863.748 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 726 total (1 active), Execution time: mean = 1.718 ms, total = 1.247 s, Queueing time: mean = 62.415 us, max = 1.632 ms, min = 10.033 us, total = 45.313 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 121 total (1 active, 1 running), Execution time: mean = 2.623 ms, total = 317.396 ms, Queueing time: mean = 59.798 us, max = 172.215 us, min = 13.784 us, total = 7.236 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 14 total (1 active), Execution time: mean = 514.197 s, total = 7198.760 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 13 total (0 active), Execution time: mean = 326.409 us, total = 4.243 ms, Queueing time: mean = 94.875 us, max = 410.175 us, min = 20.320 us, total = 1.233 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 6.568 us, total = 59.110 us, Queueing time: mean = 49.409 us, max = 79.050 us, min = 26.627 us, total = 444.680 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:14:16,231 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:14:16,541 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 638848 total (35 active) [state-dump] Queueing time: mean = 5.632 ms, max = 590.169 s, min = -0.000 s, total = 3597.886 s [state-dump] Execution time: mean = 11.423 ms, total = 7297.401 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 153678 total (0 active), Execution time: mean = 445.556 us, total = 68.472 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 153678 total (0 active), Execution time: mean = 30.579 us, total = 4.699 s, Queueing time: mean = 90.837 us, max = 23.460 ms, min = 1.508 us, total = 13.960 s [state-dump] RaySyncer.OnDemandBroadcasting - 73139 total (1 active), Execution time: mean = 9.400 us, total = 687.533 ms, Queueing time: mean = 81.227 us, max = 65.085 ms, min = -0.000 s, total = 5.941 s [state-dump] NodeManager.CheckGC - 73139 total (1 active), Execution time: mean = 3.859 us, total = 282.218 ms, Queueing time: mean = 85.966 us, max = 60.039 ms, min = 3.126 us, total = 6.287 s [state-dump] ObjectManager.UpdateAvailableMemory - 73138 total (0 active), Execution time: mean = 4.962 us, total = 362.915 ms, Queueing time: mean = 84.313 us, max = 48.698 ms, min = 2.040 us, total = 6.166 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 36588 total (1 active), Execution time: mean = 16.119 us, total = 589.760 ms, Queueing time: mean = 65.389 us, max = 41.182 ms, min = -0.000 s, total = 2.392 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 29224 total (1 active), Execution time: mean = 433.210 us, total = 12.660 s, Queueing time: mean = 64.657 us, max = 27.346 ms, min = -0.000 s, total = 1.890 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 7320 total (1 active), Execution time: mean = 8.289 us, total = 60.674 ms, Queueing time: mean = 171.968 us, max = 4.336 ms, min = -0.000 s, total = 1.259 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7320 total (1 active), Execution time: mean = 14.815 us, total = 108.445 ms, Queueing time: mean = 60.824 us, max = 3.804 ms, min = 7.553 us, total = 445.235 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7320 total (1 active), Execution time: mean = 3.161 us, total = 23.140 ms, Queueing time: mean = 175.321 us, max = 4.341 ms, min = 2.496 us, total = 1.283 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7318 total (0 active), Execution time: mean = 97.916 us, total = 716.551 ms, Queueing time: mean = 89.539 us, max = 2.573 ms, min = 2.559 us, total = 655.247 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7318 total (0 active), Execution time: mean = 544.806 us, total = 3.987 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2441 total (1 active), Execution time: mean = 8.002 us, total = 19.533 ms, Queueing time: mean = 68.414 us, max = 6.635 ms, min = 11.179 us, total = 166.998 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1464 total (0 active), Execution time: mean = 1.362 ms, total = 1.994 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1464 total (0 active), Execution time: mean = 49.665 us, total = 72.710 ms, Queueing time: mean = 89.623 us, max = 3.960 ms, min = 6.906 us, total = 131.207 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1464 total (1 active), Execution time: mean = 523.937 us, total = 767.044 ms, Queueing time: mean = 367.328 us, max = 2.197 ms, min = 6.917 us, total = 537.768 ms [state-dump] NodeManager.GcsCheckAlive - 1464 total (1 active), Execution time: mean = 295.542 us, total = 432.673 ms, Queueing time: mean = 595.338 us, max = 2.307 ms, min = 5.323 us, total = 871.575 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 732 total (1 active), Execution time: mean = 1.719 ms, total = 1.258 s, Queueing time: mean = 62.452 us, max = 1.632 ms, min = 10.033 us, total = 45.715 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 122 total (1 active, 1 running), Execution time: mean = 2.624 ms, total = 320.179 ms, Queueing time: mean = 60.384 us, max = 172.215 us, min = 13.784 us, total = 7.367 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 14 total (1 active), Execution time: mean = 514.197 s, total = 7198.760 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 13 total (0 active), Execution time: mean = 326.409 us, total = 4.243 ms, Queueing time: mean = 94.875 us, max = 410.175 us, min = 20.320 us, total = 1.233 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 6.568 us, total = 59.110 us, Queueing time: mean = 49.409 us, max = 79.050 us, min = 26.627 us, total = 444.680 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 07:15:16,231 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:15:16,544 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 644080 total (35 active) [state-dump] Queueing time: mean = 5.587 ms, max = 590.169 s, min = -0.000 s, total = 3598.271 s [state-dump] Execution time: mean = 11.331 ms, total = 7298.271 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 154938 total (0 active), Execution time: mean = 446.017 us, total = 69.105 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 154938 total (0 active), Execution time: mean = 30.592 us, total = 4.740 s, Queueing time: mean = 91.032 us, max = 23.460 ms, min = 1.508 us, total = 14.104 s [state-dump] RaySyncer.OnDemandBroadcasting - 73738 total (1 active), Execution time: mean = 9.393 us, total = 692.625 ms, Queueing time: mean = 81.203 us, max = 65.085 ms, min = -0.000 s, total = 5.988 s [state-dump] NodeManager.CheckGC - 73738 total (1 active), Execution time: mean = 3.851 us, total = 283.960 ms, Queueing time: mean = 85.941 us, max = 60.039 ms, min = 3.126 us, total = 6.337 s [state-dump] ObjectManager.UpdateAvailableMemory - 73737 total (0 active), Execution time: mean = 4.960 us, total = 365.706 ms, Queueing time: mean = 84.430 us, max = 48.698 ms, min = 2.040 us, total = 6.226 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 36888 total (1 active), Execution time: mean = 16.104 us, total = 594.035 ms, Queueing time: mean = 65.389 us, max = 41.182 ms, min = -0.000 s, total = 2.412 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 29464 total (1 active), Execution time: mean = 433.086 us, total = 12.760 s, Queueing time: mean = 64.720 us, max = 27.346 ms, min = -0.000 s, total = 1.907 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 7380 total (1 active), Execution time: mean = 8.285 us, total = 61.147 ms, Queueing time: mean = 172.033 us, max = 4.336 ms, min = -0.000 s, total = 1.270 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7380 total (1 active), Execution time: mean = 14.810 us, total = 109.297 ms, Queueing time: mean = 60.812 us, max = 3.804 ms, min = 7.553 us, total = 448.792 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7380 total (1 active), Execution time: mean = 3.162 us, total = 23.333 ms, Queueing time: mean = 175.384 us, max = 4.341 ms, min = 2.496 us, total = 1.294 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7378 total (0 active), Execution time: mean = 97.944 us, total = 722.631 ms, Queueing time: mean = 89.685 us, max = 2.573 ms, min = 2.559 us, total = 661.698 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7378 total (0 active), Execution time: mean = 545.209 us, total = 4.023 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2461 total (1 active), Execution time: mean = 8.003 us, total = 19.695 ms, Queueing time: mean = 68.400 us, max = 6.635 ms, min = 11.179 us, total = 168.333 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1476 total (0 active), Execution time: mean = 1.361 ms, total = 2.009 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1476 total (0 active), Execution time: mean = 49.651 us, total = 73.285 ms, Queueing time: mean = 89.646 us, max = 3.960 ms, min = 6.906 us, total = 132.317 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1476 total (1 active), Execution time: mean = 524.036 us, total = 773.477 ms, Queueing time: mean = 367.510 us, max = 2.197 ms, min = 6.917 us, total = 542.445 ms [state-dump] NodeManager.GcsCheckAlive - 1476 total (1 active), Execution time: mean = 295.329 us, total = 435.906 ms, Queueing time: mean = 595.856 us, max = 2.307 ms, min = 5.323 us, total = 879.484 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 738 total (1 active), Execution time: mean = 1.719 ms, total = 1.269 s, Queueing time: mean = 62.387 us, max = 1.632 ms, min = 10.033 us, total = 46.042 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 123 total (1 active, 1 running), Execution time: mean = 2.626 ms, total = 322.970 ms, Queueing time: mean = 60.280 us, max = 172.215 us, min = 13.784 us, total = 7.414 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 14 total (1 active), Execution time: mean = 514.197 s, total = 7198.760 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 13 total (0 active), Execution time: mean = 326.409 us, total = 4.243 ms, Queueing time: mean = 94.875 us, max = 410.175 us, min = 20.320 us, total = 1.233 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 6.568 us, total = 59.110 us, Queueing time: mean = 49.409 us, max = 79.050 us, min = 26.627 us, total = 444.680 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 07:16:16,231 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:16:16,547 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 649314 total (35 active) [state-dump] Queueing time: mean = 5.542 ms, max = 590.169 s, min = -0.000 s, total = 3598.674 s [state-dump] Execution time: mean = 11.241 ms, total = 7299.163 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 156198 total (0 active), Execution time: mean = 446.544 us, total = 69.749 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 156198 total (0 active), Execution time: mean = 30.600 us, total = 4.780 s, Queueing time: mean = 91.192 us, max = 23.460 ms, min = 1.508 us, total = 14.244 s [state-dump] RaySyncer.OnDemandBroadcasting - 74338 total (1 active), Execution time: mean = 9.398 us, total = 698.618 ms, Queueing time: mean = 81.272 us, max = 65.085 ms, min = -0.000 s, total = 6.042 s [state-dump] NodeManager.CheckGC - 74338 total (1 active), Execution time: mean = 3.846 us, total = 285.876 ms, Queueing time: mean = 86.019 us, max = 60.039 ms, min = 3.126 us, total = 6.394 s [state-dump] ObjectManager.UpdateAvailableMemory - 74337 total (0 active), Execution time: mean = 4.965 us, total = 369.098 ms, Queueing time: mean = 84.613 us, max = 48.698 ms, min = 2.040 us, total = 6.290 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 37188 total (1 active), Execution time: mean = 16.109 us, total = 599.062 ms, Queueing time: mean = 65.420 us, max = 41.182 ms, min = -0.000 s, total = 2.433 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 29703 total (1 active), Execution time: mean = 433.183 us, total = 12.867 s, Queueing time: mean = 64.782 us, max = 27.346 ms, min = -0.000 s, total = 1.924 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 7440 total (1 active), Execution time: mean = 8.288 us, total = 61.662 ms, Queueing time: mean = 172.077 us, max = 4.336 ms, min = -0.000 s, total = 1.280 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7440 total (1 active), Execution time: mean = 14.828 us, total = 110.320 ms, Queueing time: mean = 61.142 us, max = 3.804 ms, min = 7.553 us, total = 454.893 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7440 total (1 active), Execution time: mean = 3.162 us, total = 23.522 ms, Queueing time: mean = 175.431 us, max = 4.341 ms, min = 2.496 us, total = 1.305 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7438 total (0 active), Execution time: mean = 97.998 us, total = 728.909 ms, Queueing time: mean = 89.868 us, max = 2.573 ms, min = 2.559 us, total = 668.436 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7438 total (0 active), Execution time: mean = 545.935 us, total = 4.061 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2481 total (1 active), Execution time: mean = 8.012 us, total = 19.878 ms, Queueing time: mean = 68.544 us, max = 6.635 ms, min = 11.179 us, total = 170.058 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1488 total (0 active), Execution time: mean = 1.361 ms, total = 2.025 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1488 total (0 active), Execution time: mean = 49.730 us, total = 73.998 ms, Queueing time: mean = 89.640 us, max = 3.960 ms, min = 6.906 us, total = 133.384 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1488 total (1 active), Execution time: mean = 523.788 us, total = 779.397 ms, Queueing time: mean = 368.009 us, max = 2.197 ms, min = 6.917 us, total = 547.597 ms [state-dump] NodeManager.GcsCheckAlive - 1488 total (1 active), Execution time: mean = 295.226 us, total = 439.296 ms, Queueing time: mean = 596.238 us, max = 2.307 ms, min = 5.323 us, total = 887.201 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 744 total (1 active), Execution time: mean = 1.720 ms, total = 1.279 s, Queueing time: mean = 62.319 us, max = 1.632 ms, min = 10.033 us, total = 46.366 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 124 total (1 active, 1 running), Execution time: mean = 2.626 ms, total = 325.616 ms, Queueing time: mean = 60.272 us, max = 172.215 us, min = 13.784 us, total = 7.474 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 14 total (1 active), Execution time: mean = 514.197 s, total = 7198.760 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 13 total (0 active), Execution time: mean = 326.409 us, total = 4.243 ms, Queueing time: mean = 94.875 us, max = 410.175 us, min = 20.320 us, total = 1.233 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 6.568 us, total = 59.110 us, Queueing time: mean = 49.409 us, max = 79.050 us, min = 26.627 us, total = 444.680 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:17:16,231 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:17:16,550 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 654545 total (35 active) [state-dump] Queueing time: mean = 5.499 ms, max = 590.169 s, min = -0.000 s, total = 3599.033 s [state-dump] Execution time: mean = 11.153 ms, total = 7299.990 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 157458 total (0 active), Execution time: mean = 446.748 us, total = 70.344 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 157458 total (0 active), Execution time: mean = 30.593 us, total = 4.817 s, Queueing time: mean = 91.286 us, max = 23.460 ms, min = 1.508 us, total = 14.374 s [state-dump] RaySyncer.OnDemandBroadcasting - 74937 total (1 active), Execution time: mean = 9.398 us, total = 704.235 ms, Queueing time: mean = 81.242 us, max = 65.085 ms, min = -0.000 s, total = 6.088 s [state-dump] NodeManager.CheckGC - 74937 total (1 active), Execution time: mean = 3.839 us, total = 287.665 ms, Queueing time: mean = 85.995 us, max = 60.039 ms, min = 3.126 us, total = 6.444 s [state-dump] ObjectManager.UpdateAvailableMemory - 74936 total (0 active), Execution time: mean = 4.967 us, total = 372.198 ms, Queueing time: mean = 84.600 us, max = 48.698 ms, min = 2.040 us, total = 6.340 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 37487 total (1 active), Execution time: mean = 16.116 us, total = 604.125 ms, Queueing time: mean = 65.428 us, max = 41.182 ms, min = -0.000 s, total = 2.453 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 29943 total (1 active), Execution time: mean = 433.083 us, total = 12.968 s, Queueing time: mean = 64.788 us, max = 27.346 ms, min = -0.000 s, total = 1.940 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 7500 total (1 active), Execution time: mean = 8.288 us, total = 62.160 ms, Queueing time: mean = 172.081 us, max = 4.336 ms, min = -0.000 s, total = 1.291 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7500 total (1 active), Execution time: mean = 14.819 us, total = 111.141 ms, Queueing time: mean = 61.539 us, max = 3.804 ms, min = 7.553 us, total = 461.540 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7500 total (1 active), Execution time: mean = 3.161 us, total = 23.708 ms, Queueing time: mean = 175.434 us, max = 4.341 ms, min = 2.496 us, total = 1.316 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7498 total (0 active), Execution time: mean = 98.010 us, total = 734.880 ms, Queueing time: mean = 89.847 us, max = 2.573 ms, min = 2.559 us, total = 673.675 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7498 total (0 active), Execution time: mean = 546.005 us, total = 4.094 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2501 total (1 active), Execution time: mean = 8.008 us, total = 20.027 ms, Queueing time: mean = 68.445 us, max = 6.635 ms, min = 11.179 us, total = 171.181 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1500 total (0 active), Execution time: mean = 1.359 ms, total = 2.039 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1500 total (0 active), Execution time: mean = 49.705 us, total = 74.558 ms, Queueing time: mean = 89.485 us, max = 3.960 ms, min = 6.906 us, total = 134.227 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1500 total (1 active), Execution time: mean = 523.714 us, total = 785.571 ms, Queueing time: mean = 368.155 us, max = 2.197 ms, min = 6.917 us, total = 552.233 ms [state-dump] NodeManager.GcsCheckAlive - 1500 total (1 active), Execution time: mean = 295.055 us, total = 442.582 ms, Queueing time: mean = 596.439 us, max = 2.307 ms, min = 5.323 us, total = 894.659 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 750 total (1 active), Execution time: mean = 1.720 ms, total = 1.290 s, Queueing time: mean = 62.213 us, max = 1.632 ms, min = 10.033 us, total = 46.660 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 125 total (1 active, 1 running), Execution time: mean = 2.629 ms, total = 328.603 ms, Queueing time: mean = 60.432 us, max = 172.215 us, min = 13.784 us, total = 7.554 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 14 total (1 active), Execution time: mean = 514.197 s, total = 7198.760 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 13 total (0 active), Execution time: mean = 326.409 us, total = 4.243 ms, Queueing time: mean = 94.875 us, max = 410.175 us, min = 20.320 us, total = 1.233 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 6.568 us, total = 59.110 us, Queueing time: mean = 49.409 us, max = 79.050 us, min = 26.627 us, total = 444.680 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:18:16,232 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:18:16,553 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 659773 total (35 active) [state-dump] Queueing time: mean = 5.455 ms, max = 590.169 s, min = -0.000 s, total = 3599.368 s [state-dump] Execution time: mean = 11.066 ms, total = 7300.772 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 158717 total (0 active), Execution time: mean = 446.689 us, total = 70.897 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 158717 total (0 active), Execution time: mean = 30.580 us, total = 4.854 s, Queueing time: mean = 91.272 us, max = 23.460 ms, min = 1.417 us, total = 14.486 s [state-dump] RaySyncer.OnDemandBroadcasting - 75537 total (1 active), Execution time: mean = 9.388 us, total = 709.162 ms, Queueing time: mean = 81.190 us, max = 65.085 ms, min = -0.000 s, total = 6.133 s [state-dump] NodeManager.CheckGC - 75537 total (1 active), Execution time: mean = 3.831 us, total = 289.412 ms, Queueing time: mean = 85.941 us, max = 60.039 ms, min = 3.126 us, total = 6.492 s [state-dump] ObjectManager.UpdateAvailableMemory - 75536 total (0 active), Execution time: mean = 4.965 us, total = 375.054 ms, Queueing time: mean = 84.610 us, max = 48.698 ms, min = 2.040 us, total = 6.391 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 37787 total (1 active), Execution time: mean = 16.097 us, total = 608.267 ms, Queueing time: mean = 65.371 us, max = 41.182 ms, min = -0.000 s, total = 2.470 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 30182 total (1 active), Execution time: mean = 433.053 us, total = 13.070 s, Queueing time: mean = 64.726 us, max = 27.346 ms, min = -0.000 s, total = 1.954 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7560 total (1 active), Execution time: mean = 14.807 us, total = 111.942 ms, Queueing time: mean = 61.492 us, max = 3.804 ms, min = 7.553 us, total = 464.879 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 7559 total (1 active), Execution time: mean = 8.280 us, total = 62.586 ms, Queueing time: mean = 172.227 us, max = 4.336 ms, min = -0.000 s, total = 1.302 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7559 total (1 active), Execution time: mean = 3.160 us, total = 23.886 ms, Queueing time: mean = 175.574 us, max = 4.341 ms, min = 2.496 us, total = 1.327 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7557 total (0 active), Execution time: mean = 98.038 us, total = 740.870 ms, Queueing time: mean = 89.836 us, max = 2.573 ms, min = 2.559 us, total = 678.889 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7557 total (0 active), Execution time: mean = 545.938 us, total = 4.126 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2521 total (1 active), Execution time: mean = 8.002 us, total = 20.172 ms, Queueing time: mean = 68.354 us, max = 6.635 ms, min = 11.179 us, total = 172.320 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1512 total (0 active), Execution time: mean = 1.357 ms, total = 2.052 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1512 total (0 active), Execution time: mean = 49.687 us, total = 75.127 ms, Queueing time: mean = 89.465 us, max = 3.960 ms, min = 6.906 us, total = 135.272 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1512 total (1 active), Execution time: mean = 523.513 us, total = 791.552 ms, Queueing time: mean = 368.928 us, max = 2.197 ms, min = 6.917 us, total = 557.820 ms [state-dump] NodeManager.GcsCheckAlive - 1512 total (1 active), Execution time: mean = 294.853 us, total = 445.818 ms, Queueing time: mean = 597.257 us, max = 2.307 ms, min = 5.323 us, total = 903.053 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 756 total (1 active), Execution time: mean = 1.721 ms, total = 1.301 s, Queueing time: mean = 62.216 us, max = 1.632 ms, min = 10.033 us, total = 47.035 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 126 total (1 active, 1 running), Execution time: mean = 2.629 ms, total = 331.258 ms, Queueing time: mean = 60.388 us, max = 172.215 us, min = 13.784 us, total = 7.609 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 14 total (1 active), Execution time: mean = 514.197 s, total = 7198.760 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 13 total (0 active), Execution time: mean = 326.409 us, total = 4.243 ms, Queueing time: mean = 94.875 us, max = 410.175 us, min = 20.320 us, total = 1.233 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 6.568 us, total = 59.110 us, Queueing time: mean = 49.409 us, max = 79.050 us, min = 26.627 us, total = 444.680 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:19:16,232 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:19:16,556 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 665005 total (35 active) [state-dump] Queueing time: mean = 5.413 ms, max = 590.169 s, min = -0.000 s, total = 3599.806 s [state-dump] Execution time: mean = 10.980 ms, total = 7301.728 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 159977 total (0 active), Execution time: mean = 447.514 us, total = 71.592 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 159977 total (0 active), Execution time: mean = 30.613 us, total = 4.897 s, Queueing time: mean = 91.599 us, max = 23.460 ms, min = 1.417 us, total = 14.654 s [state-dump] RaySyncer.OnDemandBroadcasting - 76136 total (1 active), Execution time: mean = 9.393 us, total = 715.166 ms, Queueing time: mean = 81.289 us, max = 65.085 ms, min = -0.000 s, total = 6.189 s [state-dump] NodeManager.CheckGC - 76136 total (1 active), Execution time: mean = 3.826 us, total = 291.261 ms, Queueing time: mean = 86.050 us, max = 60.039 ms, min = 3.126 us, total = 6.551 s [state-dump] ObjectManager.UpdateAvailableMemory - 76135 total (0 active), Execution time: mean = 4.974 us, total = 378.711 ms, Queueing time: mean = 84.857 us, max = 48.698 ms, min = 2.040 us, total = 6.461 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 38087 total (1 active), Execution time: mean = 16.102 us, total = 613.282 ms, Queueing time: mean = 65.424 us, max = 41.182 ms, min = -0.000 s, total = 2.492 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 30422 total (1 active), Execution time: mean = 433.260 us, total = 13.181 s, Queueing time: mean = 64.778 us, max = 27.346 ms, min = -0.000 s, total = 1.971 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7620 total (1 active), Execution time: mean = 14.820 us, total = 112.927 ms, Queueing time: mean = 61.580 us, max = 3.804 ms, min = 7.553 us, total = 469.236 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 7619 total (1 active), Execution time: mean = 8.281 us, total = 63.096 ms, Queueing time: mean = 172.122 us, max = 4.336 ms, min = -0.000 s, total = 1.311 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7619 total (1 active), Execution time: mean = 3.161 us, total = 24.085 ms, Queueing time: mean = 175.470 us, max = 4.341 ms, min = 2.496 us, total = 1.337 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7617 total (0 active), Execution time: mean = 98.129 us, total = 747.450 ms, Queueing time: mean = 90.336 us, max = 2.573 ms, min = 2.559 us, total = 688.090 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7617 total (0 active), Execution time: mean = 547.098 us, total = 4.167 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2541 total (1 active), Execution time: mean = 8.002 us, total = 20.333 ms, Queueing time: mean = 68.355 us, max = 6.635 ms, min = 11.179 us, total = 173.690 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1524 total (0 active), Execution time: mean = 1.358 ms, total = 2.069 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1524 total (0 active), Execution time: mean = 49.745 us, total = 75.811 ms, Queueing time: mean = 89.664 us, max = 3.960 ms, min = 6.906 us, total = 136.648 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1524 total (1 active), Execution time: mean = 523.492 us, total = 797.803 ms, Queueing time: mean = 368.457 us, max = 2.197 ms, min = 6.917 us, total = 561.529 ms [state-dump] NodeManager.GcsCheckAlive - 1524 total (1 active), Execution time: mean = 294.672 us, total = 449.080 ms, Queueing time: mean = 596.895 us, max = 2.307 ms, min = 5.323 us, total = 909.669 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 762 total (1 active), Execution time: mean = 1.720 ms, total = 1.311 s, Queueing time: mean = 62.331 us, max = 1.632 ms, min = 10.033 us, total = 47.496 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 127 total (1 active, 1 running), Execution time: mean = 2.632 ms, total = 334.235 ms, Queueing time: mean = 60.497 us, max = 172.215 us, min = 13.784 us, total = 7.683 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 14 total (1 active), Execution time: mean = 514.197 s, total = 7198.760 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 13 total (0 active), Execution time: mean = 326.409 us, total = 4.243 ms, Queueing time: mean = 94.875 us, max = 410.175 us, min = 20.320 us, total = 1.233 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 6.568 us, total = 59.110 us, Queueing time: mean = 49.409 us, max = 79.050 us, min = 26.627 us, total = 444.680 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:20:16,232 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:20:16,559 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 670239 total (35 active) [state-dump] Queueing time: mean = 5.372 ms, max = 590.169 s, min = -0.000 s, total = 3600.223 s [state-dump] Execution time: mean = 10.896 ms, total = 7302.668 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 161237 total (0 active), Execution time: mean = 448.242 us, total = 72.273 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 161237 total (0 active), Execution time: mean = 30.655 us, total = 4.943 s, Queueing time: mean = 91.819 us, max = 23.460 ms, min = 1.417 us, total = 14.805 s [state-dump] RaySyncer.OnDemandBroadcasting - 76736 total (1 active), Execution time: mean = 9.396 us, total = 721.015 ms, Queueing time: mean = 81.349 us, max = 65.085 ms, min = -0.000 s, total = 6.242 s [state-dump] NodeManager.CheckGC - 76736 total (1 active), Execution time: mean = 3.820 us, total = 293.100 ms, Queueing time: mean = 86.117 us, max = 60.039 ms, min = 3.126 us, total = 6.608 s [state-dump] ObjectManager.UpdateAvailableMemory - 76735 total (0 active), Execution time: mean = 4.982 us, total = 382.321 ms, Queueing time: mean = 85.070 us, max = 48.698 ms, min = 2.040 us, total = 6.528 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 38387 total (1 active), Execution time: mean = 16.107 us, total = 618.288 ms, Queueing time: mean = 65.468 us, max = 41.182 ms, min = -0.000 s, total = 2.513 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 30661 total (1 active), Execution time: mean = 433.369 us, total = 13.288 s, Queueing time: mean = 64.847 us, max = 27.346 ms, min = -0.000 s, total = 1.988 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7680 total (1 active), Execution time: mean = 14.834 us, total = 113.925 ms, Queueing time: mean = 61.678 us, max = 3.804 ms, min = 7.553 us, total = 473.685 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 7679 total (1 active), Execution time: mean = 8.286 us, total = 63.632 ms, Queueing time: mean = 172.243 us, max = 4.336 ms, min = -0.000 s, total = 1.323 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7679 total (1 active), Execution time: mean = 3.166 us, total = 24.309 ms, Queueing time: mean = 175.591 us, max = 4.341 ms, min = 2.496 us, total = 1.348 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7677 total (0 active), Execution time: mean = 98.221 us, total = 754.042 ms, Queueing time: mean = 90.523 us, max = 2.573 ms, min = 2.559 us, total = 694.941 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7677 total (0 active), Execution time: mean = 547.912 us, total = 4.206 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2561 total (1 active), Execution time: mean = 8.003 us, total = 20.495 ms, Queueing time: mean = 68.386 us, max = 6.635 ms, min = 11.179 us, total = 175.136 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1536 total (0 active), Execution time: mean = 1.359 ms, total = 2.088 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1536 total (0 active), Execution time: mean = 49.771 us, total = 76.448 ms, Queueing time: mean = 89.828 us, max = 3.960 ms, min = 6.906 us, total = 137.975 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1536 total (1 active), Execution time: mean = 523.561 us, total = 804.190 ms, Queueing time: mean = 368.876 us, max = 2.197 ms, min = 6.917 us, total = 566.593 ms [state-dump] NodeManager.GcsCheckAlive - 1536 total (1 active), Execution time: mean = 294.789 us, total = 452.797 ms, Queueing time: mean = 597.318 us, max = 2.307 ms, min = 5.323 us, total = 917.481 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 768 total (1 active), Execution time: mean = 1.721 ms, total = 1.322 s, Queueing time: mean = 62.363 us, max = 1.632 ms, min = 10.033 us, total = 47.895 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 128 total (1 active, 1 running), Execution time: mean = 2.634 ms, total = 337.138 ms, Queueing time: mean = 60.564 us, max = 172.215 us, min = 13.784 us, total = 7.752 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 14 total (1 active), Execution time: mean = 514.197 s, total = 7198.760 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 13 total (0 active), Execution time: mean = 326.409 us, total = 4.243 ms, Queueing time: mean = 94.875 us, max = 410.175 us, min = 20.320 us, total = 1.233 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 6.568 us, total = 59.110 us, Queueing time: mean = 49.409 us, max = 79.050 us, min = 26.627 us, total = 444.680 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:21:16,233 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:21:16,562 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 675471 total (35 active) [state-dump] Queueing time: mean = 5.331 ms, max = 590.169 s, min = -0.000 s, total = 3600.648 s [state-dump] Execution time: mean = 10.813 ms, total = 7303.631 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 162497 total (0 active), Execution time: mean = 449.085 us, total = 72.975 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 162497 total (0 active), Execution time: mean = 30.695 us, total = 4.988 s, Queueing time: mean = 92.046 us, max = 23.460 ms, min = 1.417 us, total = 14.957 s [state-dump] RaySyncer.OnDemandBroadcasting - 77335 total (1 active), Execution time: mean = 9.395 us, total = 726.531 ms, Queueing time: mean = 81.411 us, max = 65.085 ms, min = -0.000 s, total = 6.296 s [state-dump] NodeManager.CheckGC - 77335 total (1 active), Execution time: mean = 3.813 us, total = 294.881 ms, Queueing time: mean = 86.183 us, max = 60.039 ms, min = 3.126 us, total = 6.665 s [state-dump] ObjectManager.UpdateAvailableMemory - 77334 total (0 active), Execution time: mean = 4.990 us, total = 385.861 ms, Queueing time: mean = 85.308 us, max = 48.698 ms, min = 2.040 us, total = 6.597 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 38687 total (1 active), Execution time: mean = 16.109 us, total = 623.207 ms, Queueing time: mean = 65.557 us, max = 41.182 ms, min = -0.000 s, total = 2.536 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 30901 total (1 active), Execution time: mean = 433.516 us, total = 13.396 s, Queueing time: mean = 64.917 us, max = 27.346 ms, min = -0.000 s, total = 2.006 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7740 total (1 active), Execution time: mean = 14.842 us, total = 114.877 ms, Queueing time: mean = 61.715 us, max = 3.804 ms, min = 6.093 us, total = 477.673 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 7739 total (1 active), Execution time: mean = 8.288 us, total = 64.143 ms, Queueing time: mean = 172.432 us, max = 4.336 ms, min = -0.000 s, total = 1.334 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7739 total (1 active), Execution time: mean = 3.167 us, total = 24.508 ms, Queueing time: mean = 175.780 us, max = 4.341 ms, min = 2.496 us, total = 1.360 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7737 total (0 active), Execution time: mean = 98.288 us, total = 760.452 ms, Queueing time: mean = 90.700 us, max = 2.573 ms, min = 2.559 us, total = 701.748 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7737 total (0 active), Execution time: mean = 548.663 us, total = 4.245 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2581 total (1 active), Execution time: mean = 8.006 us, total = 20.664 ms, Queueing time: mean = 68.511 us, max = 6.635 ms, min = 11.179 us, total = 176.826 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1548 total (0 active), Execution time: mean = 1.360 ms, total = 2.105 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1548 total (0 active), Execution time: mean = 49.826 us, total = 77.131 ms, Queueing time: mean = 90.004 us, max = 3.960 ms, min = 6.906 us, total = 139.327 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1548 total (1 active), Execution time: mean = 524.232 us, total = 811.512 ms, Queueing time: mean = 369.267 us, max = 2.197 ms, min = 6.917 us, total = 571.625 ms [state-dump] NodeManager.GcsCheckAlive - 1548 total (1 active), Execution time: mean = 294.982 us, total = 456.633 ms, Queueing time: mean = 598.190 us, max = 2.395 ms, min = 5.323 us, total = 925.998 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 774 total (1 active), Execution time: mean = 1.723 ms, total = 1.334 s, Queueing time: mean = 62.315 us, max = 1.632 ms, min = 10.033 us, total = 48.232 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 129 total (1 active, 1 running), Execution time: mean = 2.636 ms, total = 340.051 ms, Queueing time: mean = 60.562 us, max = 172.215 us, min = 13.784 us, total = 7.813 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 14 total (1 active), Execution time: mean = 514.197 s, total = 7198.760 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 13 total (0 active), Execution time: mean = 326.409 us, total = 4.243 ms, Queueing time: mean = 94.875 us, max = 410.175 us, min = 20.320 us, total = 1.233 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 6.568 us, total = 59.110 us, Queueing time: mean = 49.409 us, max = 79.050 us, min = 26.627 us, total = 444.680 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:22:16,233 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:22:16,565 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 680704 total (35 active) [state-dump] Queueing time: mean = 5.290 ms, max = 590.169 s, min = -0.000 s, total = 3601.070 s [state-dump] Execution time: mean = 11.612 ms, total = 7904.570 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 163757 total (0 active), Execution time: mean = 449.808 us, total = 73.659 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 163757 total (0 active), Execution time: mean = 30.740 us, total = 5.034 s, Queueing time: mean = 92.285 us, max = 23.460 ms, min = 1.417 us, total = 15.112 s [state-dump] RaySyncer.OnDemandBroadcasting - 77934 total (1 active), Execution time: mean = 9.391 us, total = 731.874 ms, Queueing time: mean = 81.447 us, max = 65.085 ms, min = -0.000 s, total = 6.348 s [state-dump] NodeManager.CheckGC - 77934 total (1 active), Execution time: mean = 3.806 us, total = 296.587 ms, Queueing time: mean = 86.222 us, max = 60.039 ms, min = 3.126 us, total = 6.720 s [state-dump] ObjectManager.UpdateAvailableMemory - 77933 total (0 active), Execution time: mean = 4.996 us, total = 389.359 ms, Queueing time: mean = 85.578 us, max = 48.698 ms, min = 2.040 us, total = 6.669 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 38987 total (1 active), Execution time: mean = 16.101 us, total = 627.716 ms, Queueing time: mean = 65.608 us, max = 41.182 ms, min = -0.000 s, total = 2.558 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 31140 total (1 active), Execution time: mean = 433.525 us, total = 13.500 s, Queueing time: mean = 65.005 us, max = 27.346 ms, min = -0.000 s, total = 2.024 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7800 total (1 active), Execution time: mean = 14.846 us, total = 115.801 ms, Queueing time: mean = 61.794 us, max = 3.804 ms, min = 6.093 us, total = 481.993 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 7799 total (1 active), Execution time: mean = 8.291 us, total = 64.664 ms, Queueing time: mean = 172.479 us, max = 4.336 ms, min = -0.000 s, total = 1.345 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7799 total (1 active), Execution time: mean = 3.168 us, total = 24.705 ms, Queueing time: mean = 175.827 us, max = 4.341 ms, min = 2.496 us, total = 1.371 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7797 total (0 active), Execution time: mean = 98.348 us, total = 766.821 ms, Queueing time: mean = 90.896 us, max = 2.573 ms, min = 2.559 us, total = 708.714 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7797 total (0 active), Execution time: mean = 549.345 us, total = 4.283 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2601 total (1 active), Execution time: mean = 8.009 us, total = 20.830 ms, Queueing time: mean = 68.497 us, max = 6.635 ms, min = 11.179 us, total = 178.160 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1560 total (0 active), Execution time: mean = 1.361 ms, total = 2.123 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1560 total (0 active), Execution time: mean = 49.861 us, total = 77.784 ms, Queueing time: mean = 90.205 us, max = 3.960 ms, min = 6.906 us, total = 140.720 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1560 total (1 active), Execution time: mean = 524.372 us, total = 818.021 ms, Queueing time: mean = 369.346 us, max = 2.197 ms, min = 6.917 us, total = 576.179 ms [state-dump] NodeManager.GcsCheckAlive - 1560 total (1 active), Execution time: mean = 295.026 us, total = 460.240 ms, Queueing time: mean = 598.343 us, max = 2.395 ms, min = 5.323 us, total = 933.415 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 780 total (1 active), Execution time: mean = 1.724 ms, total = 1.344 s, Queueing time: mean = 62.403 us, max = 1.632 ms, min = 10.033 us, total = 48.675 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 130 total (1 active, 1 running), Execution time: mean = 2.639 ms, total = 343.016 ms, Queueing time: mean = 61.237 us, max = 172.215 us, min = 13.784 us, total = 7.961 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 15 total (1 active), Execution time: mean = 519.917 s, total = 7798.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 14 total (0 active), Execution time: mean = 327.248 us, total = 4.581 ms, Queueing time: mean = 98.878 us, max = 410.175 us, min = 20.320 us, total = 1.384 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 6.568 us, total = 59.110 us, Queueing time: mean = 49.409 us, max = 79.050 us, min = 26.627 us, total = 444.680 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:23:16,233 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:23:16,568 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 685939 total (35 active) [state-dump] Queueing time: mean = 5.250 ms, max = 590.169 s, min = -0.000 s, total = 3601.496 s [state-dump] Execution time: mean = 11.525 ms, total = 7905.522 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 165017 total (0 active), Execution time: mean = 450.575 us, total = 74.353 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 165017 total (0 active), Execution time: mean = 30.781 us, total = 5.079 s, Queueing time: mean = 92.497 us, max = 23.460 ms, min = 1.417 us, total = 15.264 s [state-dump] RaySyncer.OnDemandBroadcasting - 78534 total (1 active), Execution time: mean = 9.389 us, total = 737.348 ms, Queueing time: mean = 81.517 us, max = 65.085 ms, min = -0.000 s, total = 6.402 s [state-dump] NodeManager.CheckGC - 78534 total (1 active), Execution time: mean = 3.799 us, total = 298.387 ms, Queueing time: mean = 86.295 us, max = 60.039 ms, min = 3.126 us, total = 6.777 s [state-dump] ObjectManager.UpdateAvailableMemory - 78533 total (0 active), Execution time: mean = 5.004 us, total = 392.970 ms, Queueing time: mean = 85.844 us, max = 48.698 ms, min = 2.040 us, total = 6.742 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 39287 total (1 active), Execution time: mean = 16.101 us, total = 632.563 ms, Queueing time: mean = 65.655 us, max = 41.182 ms, min = -0.000 s, total = 2.579 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 31380 total (1 active), Execution time: mean = 433.616 us, total = 13.607 s, Queueing time: mean = 65.068 us, max = 27.346 ms, min = -0.000 s, total = 2.042 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7860 total (1 active), Execution time: mean = 14.856 us, total = 116.770 ms, Queueing time: mean = 61.887 us, max = 3.804 ms, min = 6.093 us, total = 486.432 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 7859 total (1 active), Execution time: mean = 8.293 us, total = 65.174 ms, Queueing time: mean = 172.650 us, max = 4.336 ms, min = -0.000 s, total = 1.357 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7859 total (1 active), Execution time: mean = 3.168 us, total = 24.900 ms, Queueing time: mean = 176.000 us, max = 4.341 ms, min = 2.496 us, total = 1.383 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7857 total (0 active), Execution time: mean = 98.446 us, total = 773.493 ms, Queueing time: mean = 91.083 us, max = 2.573 ms, min = 2.559 us, total = 715.637 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7857 total (0 active), Execution time: mean = 549.997 us, total = 4.321 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2621 total (1 active), Execution time: mean = 8.014 us, total = 21.006 ms, Queueing time: mean = 68.573 us, max = 6.635 ms, min = 11.179 us, total = 179.730 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1572 total (0 active), Execution time: mean = 1.362 ms, total = 2.141 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1572 total (0 active), Execution time: mean = 49.922 us, total = 78.477 ms, Queueing time: mean = 90.364 us, max = 3.960 ms, min = 6.906 us, total = 142.052 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1572 total (1 active), Execution time: mean = 524.830 us, total = 825.033 ms, Queueing time: mean = 369.767 us, max = 2.197 ms, min = 6.917 us, total = 581.274 ms [state-dump] NodeManager.GcsCheckAlive - 1572 total (1 active), Execution time: mean = 295.101 us, total = 463.899 ms, Queueing time: mean = 599.162 us, max = 2.395 ms, min = 5.323 us, total = 941.883 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 786 total (1 active), Execution time: mean = 1.725 ms, total = 1.356 s, Queueing time: mean = 62.587 us, max = 1.632 ms, min = 10.033 us, total = 49.193 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 131 total (1 active, 1 running), Execution time: mean = 2.639 ms, total = 345.657 ms, Queueing time: mean = 61.253 us, max = 172.215 us, min = 13.784 us, total = 8.024 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 15 total (1 active), Execution time: mean = 519.917 s, total = 7798.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 14 total (0 active), Execution time: mean = 327.248 us, total = 4.581 ms, Queueing time: mean = 98.878 us, max = 410.175 us, min = 20.320 us, total = 1.384 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 6.568 us, total = 59.110 us, Queueing time: mean = 49.409 us, max = 79.050 us, min = 26.627 us, total = 444.680 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:24:16,233 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:24:16,571 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 691170 total (35 active) [state-dump] Queueing time: mean = 5.211 ms, max = 590.169 s, min = -0.000 s, total = 3601.923 s [state-dump] Execution time: mean = 11.439 ms, total = 7906.475 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 166277 total (0 active), Execution time: mean = 451.301 us, total = 75.041 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 166277 total (0 active), Execution time: mean = 30.825 us, total = 5.125 s, Queueing time: mean = 92.742 us, max = 23.460 ms, min = 1.417 us, total = 15.421 s [state-dump] RaySyncer.OnDemandBroadcasting - 79133 total (1 active), Execution time: mean = 9.399 us, total = 743.800 ms, Queueing time: mean = 81.594 us, max = 65.085 ms, min = -0.000 s, total = 6.457 s [state-dump] NodeManager.CheckGC - 79133 total (1 active), Execution time: mean = 3.795 us, total = 300.335 ms, Queueing time: mean = 86.386 us, max = 60.039 ms, min = 3.126 us, total = 6.836 s [state-dump] ObjectManager.UpdateAvailableMemory - 79132 total (0 active), Execution time: mean = 5.014 us, total = 396.761 ms, Queueing time: mean = 86.060 us, max = 48.698 ms, min = 2.040 us, total = 6.810 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 39587 total (1 active), Execution time: mean = 16.118 us, total = 638.074 ms, Queueing time: mean = 65.742 us, max = 41.182 ms, min = -0.000 s, total = 2.603 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 31619 total (1 active), Execution time: mean = 433.829 us, total = 13.717 s, Queueing time: mean = 65.137 us, max = 27.346 ms, min = -0.000 s, total = 2.060 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7920 total (1 active), Execution time: mean = 14.870 us, total = 117.767 ms, Queueing time: mean = 61.960 us, max = 3.804 ms, min = 6.093 us, total = 490.724 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 7919 total (1 active), Execution time: mean = 8.307 us, total = 65.785 ms, Queueing time: mean = 172.610 us, max = 4.336 ms, min = -0.000 s, total = 1.367 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7919 total (1 active), Execution time: mean = 3.171 us, total = 25.110 ms, Queueing time: mean = 175.966 us, max = 4.341 ms, min = 2.496 us, total = 1.393 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7917 total (0 active), Execution time: mean = 98.545 us, total = 780.177 ms, Queueing time: mean = 91.363 us, max = 2.573 ms, min = 2.559 us, total = 723.318 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7917 total (0 active), Execution time: mean = 550.859 us, total = 4.361 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2641 total (1 active), Execution time: mean = 8.018 us, total = 21.176 ms, Queueing time: mean = 68.723 us, max = 6.635 ms, min = 11.179 us, total = 181.498 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1584 total (0 active), Execution time: mean = 1.363 ms, total = 2.159 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1584 total (0 active), Execution time: mean = 50.005 us, total = 79.208 ms, Queueing time: mean = 90.488 us, max = 3.960 ms, min = 6.906 us, total = 143.333 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1584 total (1 active), Execution time: mean = 524.968 us, total = 831.550 ms, Queueing time: mean = 369.458 us, max = 2.197 ms, min = 6.917 us, total = 585.221 ms [state-dump] NodeManager.GcsCheckAlive - 1584 total (1 active), Execution time: mean = 295.278 us, total = 467.720 ms, Queueing time: mean = 598.818 us, max = 2.395 ms, min = 5.323 us, total = 948.528 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 792 total (1 active), Execution time: mean = 1.725 ms, total = 1.366 s, Queueing time: mean = 62.599 us, max = 1.632 ms, min = 10.033 us, total = 49.578 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 132 total (1 active, 1 running), Execution time: mean = 2.642 ms, total = 348.690 ms, Queueing time: mean = 61.270 us, max = 172.215 us, min = 13.784 us, total = 8.088 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 15 total (1 active), Execution time: mean = 519.917 s, total = 7798.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 14 total (0 active), Execution time: mean = 327.248 us, total = 4.581 ms, Queueing time: mean = 98.878 us, max = 410.175 us, min = 20.320 us, total = 1.384 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 6.568 us, total = 59.110 us, Queueing time: mean = 49.409 us, max = 79.050 us, min = 26.627 us, total = 444.680 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:25:16,234 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:25:16,574 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 696405 total (35 active) [state-dump] Queueing time: mean = 5.173 ms, max = 590.169 s, min = -0.000 s, total = 3602.347 s [state-dump] Execution time: mean = 11.355 ms, total = 7907.412 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 167537 total (0 active), Execution time: mean = 451.950 us, total = 75.718 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 167537 total (0 active), Execution time: mean = 30.860 us, total = 5.170 s, Queueing time: mean = 92.953 us, max = 23.460 ms, min = 1.417 us, total = 15.573 s [state-dump] RaySyncer.OnDemandBroadcasting - 79733 total (1 active), Execution time: mean = 9.406 us, total = 749.950 ms, Queueing time: mean = 81.646 us, max = 65.085 ms, min = -0.000 s, total = 6.510 s [state-dump] NodeManager.CheckGC - 79733 total (1 active), Execution time: mean = 3.791 us, total = 302.240 ms, Queueing time: mean = 86.448 us, max = 60.039 ms, min = 3.126 us, total = 6.893 s [state-dump] ObjectManager.UpdateAvailableMemory - 79732 total (0 active), Execution time: mean = 5.023 us, total = 400.473 ms, Queueing time: mean = 86.275 us, max = 48.698 ms, min = 2.040 us, total = 6.879 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 39887 total (1 active), Execution time: mean = 16.129 us, total = 643.320 ms, Queueing time: mean = 65.826 us, max = 41.182 ms, min = -0.000 s, total = 2.626 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 31859 total (1 active), Execution time: mean = 433.917 us, total = 13.824 s, Queueing time: mean = 65.301 us, max = 27.346 ms, min = -0.000 s, total = 2.080 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7980 total (1 active), Execution time: mean = 14.887 us, total = 118.799 ms, Queueing time: mean = 62.016 us, max = 3.804 ms, min = 6.093 us, total = 494.888 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 7979 total (1 active), Execution time: mean = 8.310 us, total = 66.308 ms, Queueing time: mean = 172.729 us, max = 4.336 ms, min = -0.000 s, total = 1.378 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7979 total (1 active), Execution time: mean = 3.173 us, total = 25.314 ms, Queueing time: mean = 176.084 us, max = 4.341 ms, min = 2.496 us, total = 1.405 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7977 total (0 active), Execution time: mean = 98.629 us, total = 786.766 ms, Queueing time: mean = 91.540 us, max = 2.573 ms, min = 2.559 us, total = 730.215 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7977 total (0 active), Execution time: mean = 551.578 us, total = 4.400 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2661 total (1 active), Execution time: mean = 8.018 us, total = 21.336 ms, Queueing time: mean = 68.875 us, max = 6.635 ms, min = 11.179 us, total = 183.276 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1596 total (0 active), Execution time: mean = 1.363 ms, total = 2.176 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1596 total (0 active), Execution time: mean = 50.115 us, total = 79.983 ms, Queueing time: mean = 90.621 us, max = 3.960 ms, min = 6.906 us, total = 144.631 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1596 total (1 active), Execution time: mean = 525.553 us, total = 838.783 ms, Queueing time: mean = 369.490 us, max = 2.197 ms, min = 6.917 us, total = 589.707 ms [state-dump] NodeManager.GcsCheckAlive - 1596 total (1 active), Execution time: mean = 295.570 us, total = 471.730 ms, Queueing time: mean = 599.168 us, max = 2.395 ms, min = 5.323 us, total = 956.273 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 798 total (1 active), Execution time: mean = 1.726 ms, total = 1.377 s, Queueing time: mean = 62.614 us, max = 1.632 ms, min = 10.033 us, total = 49.966 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 133 total (1 active, 1 running), Execution time: mean = 2.645 ms, total = 351.797 ms, Queueing time: mean = 61.348 us, max = 172.215 us, min = 13.784 us, total = 8.159 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 15 total (1 active), Execution time: mean = 519.917 s, total = 7798.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 14 total (0 active), Execution time: mean = 327.248 us, total = 4.581 ms, Queueing time: mean = 98.878 us, max = 410.175 us, min = 20.320 us, total = 1.384 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 6.568 us, total = 59.110 us, Queueing time: mean = 49.409 us, max = 79.050 us, min = 26.627 us, total = 444.680 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:26:16,234 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:26:16,576 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 701635 total (35 active) [state-dump] Queueing time: mean = 5.135 ms, max = 590.169 s, min = -0.000 s, total = 3602.758 s [state-dump] Execution time: mean = 11.271 ms, total = 7908.358 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 168797 total (0 active), Execution time: mean = 452.645 us, total = 76.405 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 168797 total (0 active), Execution time: mean = 30.879 us, total = 5.212 s, Queueing time: mean = 93.100 us, max = 23.460 ms, min = 1.417 us, total = 15.715 s [state-dump] RaySyncer.OnDemandBroadcasting - 80332 total (1 active), Execution time: mean = 9.410 us, total = 755.952 ms, Queueing time: mean = 81.713 us, max = 65.085 ms, min = -0.000 s, total = 6.564 s [state-dump] NodeManager.CheckGC - 80332 total (1 active), Execution time: mean = 3.787 us, total = 304.190 ms, Queueing time: mean = 86.522 us, max = 60.039 ms, min = 3.126 us, total = 6.951 s [state-dump] ObjectManager.UpdateAvailableMemory - 80331 total (0 active), Execution time: mean = 5.030 us, total = 404.031 ms, Queueing time: mean = 86.456 us, max = 48.698 ms, min = 2.040 us, total = 6.945 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 40186 total (1 active), Execution time: mean = 16.143 us, total = 648.709 ms, Queueing time: mean = 65.893 us, max = 41.182 ms, min = -0.000 s, total = 2.648 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 32098 total (1 active), Execution time: mean = 434.087 us, total = 13.933 s, Queueing time: mean = 65.379 us, max = 27.346 ms, min = -0.000 s, total = 2.099 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8040 total (1 active), Execution time: mean = 14.893 us, total = 119.741 ms, Queueing time: mean = 62.586 us, max = 4.009 ms, min = 6.093 us, total = 503.188 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8039 total (1 active), Execution time: mean = 8.318 us, total = 66.870 ms, Queueing time: mean = 172.730 us, max = 4.336 ms, min = -0.000 s, total = 1.389 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8039 total (1 active), Execution time: mean = 3.174 us, total = 25.515 ms, Queueing time: mean = 176.089 us, max = 4.341 ms, min = 2.496 us, total = 1.416 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8037 total (0 active), Execution time: mean = 98.692 us, total = 793.189 ms, Queueing time: mean = 91.724 us, max = 2.573 ms, min = 2.559 us, total = 737.189 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8037 total (0 active), Execution time: mean = 552.482 us, total = 4.440 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2681 total (1 active), Execution time: mean = 8.023 us, total = 21.511 ms, Queueing time: mean = 68.968 us, max = 6.635 ms, min = 11.179 us, total = 184.902 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1608 total (0 active), Execution time: mean = 1.364 ms, total = 2.193 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1608 total (0 active), Execution time: mean = 50.153 us, total = 80.646 ms, Queueing time: mean = 90.666 us, max = 3.960 ms, min = 6.906 us, total = 145.790 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1608 total (1 active), Execution time: mean = 526.337 us, total = 846.349 ms, Queueing time: mean = 368.754 us, max = 2.197 ms, min = 6.917 us, total = 592.956 ms [state-dump] NodeManager.GcsCheckAlive - 1608 total (1 active), Execution time: mean = 295.747 us, total = 475.561 ms, Queueing time: mean = 599.023 us, max = 2.395 ms, min = 5.323 us, total = 963.230 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 804 total (1 active), Execution time: mean = 1.726 ms, total = 1.387 s, Queueing time: mean = 62.679 us, max = 1.632 ms, min = 10.033 us, total = 50.394 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 134 total (1 active, 1 running), Execution time: mean = 2.648 ms, total = 354.793 ms, Queueing time: mean = 61.502 us, max = 172.215 us, min = 13.784 us, total = 8.241 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 15 total (1 active), Execution time: mean = 519.917 s, total = 7798.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 14 total (0 active), Execution time: mean = 327.248 us, total = 4.581 ms, Queueing time: mean = 98.878 us, max = 410.175 us, min = 20.320 us, total = 1.384 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 6.568 us, total = 59.110 us, Queueing time: mean = 49.409 us, max = 79.050 us, min = 26.627 us, total = 444.680 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:27:16,234 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:27:16,578 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 706871 total (35 active) [state-dump] Queueing time: mean = 5.097 ms, max = 590.169 s, min = -0.000 s, total = 3603.173 s [state-dump] Execution time: mean = 11.189 ms, total = 7909.304 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 170057 total (0 active), Execution time: mean = 453.381 us, total = 77.101 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 170057 total (0 active), Execution time: mean = 30.903 us, total = 5.255 s, Queueing time: mean = 93.306 us, max = 23.460 ms, min = 1.417 us, total = 15.867 s [state-dump] RaySyncer.OnDemandBroadcasting - 80932 total (1 active), Execution time: mean = 9.407 us, total = 761.294 ms, Queueing time: mean = 81.749 us, max = 65.085 ms, min = -0.000 s, total = 6.616 s [state-dump] NodeManager.CheckGC - 80932 total (1 active), Execution time: mean = 3.781 us, total = 305.980 ms, Queueing time: mean = 86.559 us, max = 60.039 ms, min = 3.126 us, total = 7.005 s [state-dump] ObjectManager.UpdateAvailableMemory - 80931 total (0 active), Execution time: mean = 5.035 us, total = 407.483 ms, Queueing time: mean = 86.658 us, max = 48.698 ms, min = 2.040 us, total = 7.013 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 40486 total (1 active), Execution time: mean = 16.142 us, total = 653.535 ms, Queueing time: mean = 65.956 us, max = 41.182 ms, min = -0.000 s, total = 2.670 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 32338 total (1 active), Execution time: mean = 434.103 us, total = 14.038 s, Queueing time: mean = 65.422 us, max = 27.346 ms, min = -0.000 s, total = 2.116 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8100 total (1 active), Execution time: mean = 14.895 us, total = 120.652 ms, Queueing time: mean = 62.640 us, max = 4.009 ms, min = 6.093 us, total = 507.387 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8099 total (1 active), Execution time: mean = 8.318 us, total = 67.366 ms, Queueing time: mean = 172.766 us, max = 4.336 ms, min = -0.000 s, total = 1.399 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8099 total (1 active), Execution time: mean = 3.174 us, total = 25.706 ms, Queueing time: mean = 176.126 us, max = 4.341 ms, min = 2.496 us, total = 1.426 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8097 total (0 active), Execution time: mean = 98.755 us, total = 799.622 ms, Queueing time: mean = 91.891 us, max = 2.573 ms, min = 2.559 us, total = 744.038 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8097 total (0 active), Execution time: mean = 553.182 us, total = 4.479 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2701 total (1 active), Execution time: mean = 8.025 us, total = 21.676 ms, Queueing time: mean = 69.187 us, max = 6.635 ms, min = 11.179 us, total = 186.875 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1620 total (0 active), Execution time: mean = 1.364 ms, total = 2.210 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1620 total (0 active), Execution time: mean = 50.224 us, total = 81.364 ms, Queueing time: mean = 90.769 us, max = 3.960 ms, min = 6.906 us, total = 147.047 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1620 total (1 active), Execution time: mean = 526.335 us, total = 852.662 ms, Queueing time: mean = 368.970 us, max = 2.197 ms, min = 6.917 us, total = 597.731 ms [state-dump] NodeManager.GcsCheckAlive - 1620 total (1 active), Execution time: mean = 295.745 us, total = 479.106 ms, Queueing time: mean = 599.239 us, max = 2.395 ms, min = 5.323 us, total = 970.767 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 810 total (1 active), Execution time: mean = 1.727 ms, total = 1.399 s, Queueing time: mean = 62.624 us, max = 1.632 ms, min = 10.033 us, total = 50.725 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 135 total (1 active, 1 running), Execution time: mean = 2.643 ms, total = 356.744 ms, Queueing time: mean = 61.676 us, max = 172.215 us, min = 13.784 us, total = 8.326 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 15 total (1 active), Execution time: mean = 519.917 s, total = 7798.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 14 total (0 active), Execution time: mean = 327.248 us, total = 4.581 ms, Queueing time: mean = 98.878 us, max = 410.175 us, min = 20.320 us, total = 1.384 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 10 total (1 active), Execution time: mean = 6.790 us, total = 67.903 us, Queueing time: mean = 50.768 us, max = 79.050 us, min = 26.627 us, total = 507.681 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:28:16,234 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:28:16,581 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 712102 total (35 active) [state-dump] Queueing time: mean = 5.060 ms, max = 590.169 s, min = -0.000 s, total = 3603.519 s [state-dump] Execution time: mean = 11.108 ms, total = 7910.102 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 171317 total (0 active), Execution time: mean = 453.302 us, total = 77.658 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 171317 total (0 active), Execution time: mean = 30.888 us, total = 5.292 s, Queueing time: mean = 93.279 us, max = 23.460 ms, min = 1.417 us, total = 15.980 s [state-dump] RaySyncer.OnDemandBroadcasting - 81531 total (1 active), Execution time: mean = 9.403 us, total = 766.609 ms, Queueing time: mean = 81.722 us, max = 65.085 ms, min = -0.000 s, total = 6.663 s [state-dump] NodeManager.CheckGC - 81531 total (1 active), Execution time: mean = 3.774 us, total = 307.696 ms, Queueing time: mean = 86.535 us, max = 60.039 ms, min = 3.126 us, total = 7.055 s [state-dump] ObjectManager.UpdateAvailableMemory - 81530 total (0 active), Execution time: mean = 5.035 us, total = 410.464 ms, Queueing time: mean = 86.679 us, max = 48.698 ms, min = 2.040 us, total = 7.067 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 40786 total (1 active), Execution time: mean = 16.140 us, total = 658.271 ms, Queueing time: mean = 65.952 us, max = 41.182 ms, min = -0.000 s, total = 2.690 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 32577 total (1 active), Execution time: mean = 434.076 us, total = 14.141 s, Queueing time: mean = 65.398 us, max = 27.346 ms, min = -0.000 s, total = 2.130 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8160 total (1 active), Execution time: mean = 14.886 us, total = 121.471 ms, Queueing time: mean = 62.621 us, max = 4.009 ms, min = 6.093 us, total = 510.986 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8159 total (1 active), Execution time: mean = 8.317 us, total = 67.860 ms, Queueing time: mean = 172.856 us, max = 4.336 ms, min = -0.000 s, total = 1.410 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8159 total (1 active), Execution time: mean = 3.179 us, total = 25.937 ms, Queueing time: mean = 176.211 us, max = 4.341 ms, min = 2.496 us, total = 1.438 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8157 total (0 active), Execution time: mean = 98.811 us, total = 805.999 ms, Queueing time: mean = 91.967 us, max = 2.573 ms, min = 2.559 us, total = 750.175 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8157 total (0 active), Execution time: mean = 553.785 us, total = 4.517 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2721 total (1 active), Execution time: mean = 8.021 us, total = 21.826 ms, Queueing time: mean = 69.130 us, max = 6.635 ms, min = 11.179 us, total = 188.103 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1632 total (0 active), Execution time: mean = 1.364 ms, total = 2.226 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1632 total (0 active), Execution time: mean = 50.228 us, total = 81.971 ms, Queueing time: mean = 90.774 us, max = 3.960 ms, min = 6.906 us, total = 148.143 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1632 total (1 active), Execution time: mean = 526.565 us, total = 859.355 ms, Queueing time: mean = 369.154 us, max = 2.197 ms, min = 6.917 us, total = 602.459 ms [state-dump] NodeManager.GcsCheckAlive - 1632 total (1 active), Execution time: mean = 295.675 us, total = 482.542 ms, Queueing time: mean = 599.711 us, max = 2.395 ms, min = 5.323 us, total = 978.729 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 816 total (1 active), Execution time: mean = 1.727 ms, total = 1.409 s, Queueing time: mean = 62.734 us, max = 1.632 ms, min = 10.033 us, total = 51.191 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 136 total (1 active, 1 running), Execution time: mean = 2.641 ms, total = 359.126 ms, Queueing time: mean = 61.745 us, max = 172.215 us, min = 13.784 us, total = 8.397 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 15 total (1 active), Execution time: mean = 519.917 s, total = 7798.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 14 total (0 active), Execution time: mean = 327.248 us, total = 4.581 ms, Queueing time: mean = 98.878 us, max = 410.175 us, min = 20.320 us, total = 1.384 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 10 total (1 active), Execution time: mean = 6.790 us, total = 67.903 us, Queueing time: mean = 50.768 us, max = 79.050 us, min = 26.627 us, total = 507.681 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 07:29:16,235 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:29:16,584 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 717337 total (35 active) [state-dump] Queueing time: mean = 5.024 ms, max = 590.169 s, min = -0.000 s, total = 3603.776 s [state-dump] Execution time: mean = 11.028 ms, total = 7910.782 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 172577 total (0 active), Execution time: mean = 452.682 us, total = 78.123 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 172577 total (0 active), Execution time: mean = 30.833 us, total = 5.321 s, Queueing time: mean = 93.037 us, max = 23.460 ms, min = 1.417 us, total = 16.056 s [state-dump] RaySyncer.OnDemandBroadcasting - 82131 total (1 active), Execution time: mean = 9.391 us, total = 771.290 ms, Queueing time: mean = 81.583 us, max = 65.085 ms, min = -0.000 s, total = 6.701 s [state-dump] NodeManager.CheckGC - 82131 total (1 active), Execution time: mean = 3.767 us, total = 309.359 ms, Queueing time: mean = 86.392 us, max = 60.039 ms, min = 3.126 us, total = 7.095 s [state-dump] ObjectManager.UpdateAvailableMemory - 82130 total (0 active), Execution time: mean = 5.028 us, total = 412.929 ms, Queueing time: mean = 86.502 us, max = 48.698 ms, min = 2.040 us, total = 7.104 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 41086 total (1 active), Execution time: mean = 16.109 us, total = 661.844 ms, Queueing time: mean = 65.830 us, max = 41.182 ms, min = -0.000 s, total = 2.705 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 32817 total (1 active), Execution time: mean = 433.933 us, total = 14.240 s, Queueing time: mean = 65.297 us, max = 27.346 ms, min = -0.000 s, total = 2.143 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8220 total (1 active), Execution time: mean = 14.871 us, total = 122.241 ms, Queueing time: mean = 62.500 us, max = 4.009 ms, min = 6.093 us, total = 513.753 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8219 total (1 active), Execution time: mean = 8.304 us, total = 68.248 ms, Queueing time: mean = 172.826 us, max = 4.336 ms, min = -0.000 s, total = 1.420 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8219 total (1 active), Execution time: mean = 3.176 us, total = 26.103 ms, Queueing time: mean = 176.174 us, max = 4.341 ms, min = 2.496 us, total = 1.448 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8217 total (0 active), Execution time: mean = 98.784 us, total = 811.712 ms, Queueing time: mean = 91.756 us, max = 2.573 ms, min = 2.559 us, total = 753.958 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8217 total (0 active), Execution time: mean = 553.365 us, total = 4.547 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2741 total (1 active), Execution time: mean = 8.016 us, total = 21.972 ms, Queueing time: mean = 69.005 us, max = 6.635 ms, min = 11.179 us, total = 189.144 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1644 total (0 active), Execution time: mean = 1.363 ms, total = 2.240 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1644 total (0 active), Execution time: mean = 50.210 us, total = 82.545 ms, Queueing time: mean = 90.537 us, max = 3.960 ms, min = 6.906 us, total = 148.842 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1644 total (1 active), Execution time: mean = 526.690 us, total = 865.878 ms, Queueing time: mean = 368.818 us, max = 2.197 ms, min = 6.917 us, total = 606.337 ms [state-dump] NodeManager.GcsCheckAlive - 1644 total (1 active), Execution time: mean = 295.649 us, total = 486.047 ms, Queueing time: mean = 599.542 us, max = 2.395 ms, min = 5.323 us, total = 985.648 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 822 total (1 active), Execution time: mean = 1.726 ms, total = 1.419 s, Queueing time: mean = 62.554 us, max = 1.632 ms, min = 10.033 us, total = 51.419 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 137 total (1 active, 1 running), Execution time: mean = 2.640 ms, total = 361.735 ms, Queueing time: mean = 61.749 us, max = 172.215 us, min = 13.784 us, total = 8.460 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 15 total (1 active), Execution time: mean = 519.917 s, total = 7798.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 14 total (0 active), Execution time: mean = 327.248 us, total = 4.581 ms, Queueing time: mean = 98.878 us, max = 410.175 us, min = 20.320 us, total = 1.384 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 10 total (1 active), Execution time: mean = 6.790 us, total = 67.903 us, Queueing time: mean = 50.768 us, max = 79.050 us, min = 26.627 us, total = 507.681 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:30:16,235 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:30:16,587 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 722568 total (35 active) [state-dump] Queueing time: mean = 4.988 ms, max = 590.169 s, min = -0.000 s, total = 3604.200 s [state-dump] Execution time: mean = 10.949 ms, total = 7911.745 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 173837 total (0 active), Execution time: mean = 453.468 us, total = 78.829 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 173837 total (0 active), Execution time: mean = 30.882 us, total = 5.368 s, Queueing time: mean = 93.247 us, max = 23.460 ms, min = 1.417 us, total = 16.210 s [state-dump] RaySyncer.OnDemandBroadcasting - 82730 total (1 active), Execution time: mean = 9.388 us, total = 776.668 ms, Queueing time: mean = 81.636 us, max = 65.085 ms, min = -0.000 s, total = 6.754 s [state-dump] NodeManager.CheckGC - 82730 total (1 active), Execution time: mean = 3.761 us, total = 311.122 ms, Queueing time: mean = 86.448 us, max = 60.039 ms, min = 3.126 us, total = 7.152 s [state-dump] ObjectManager.UpdateAvailableMemory - 82729 total (0 active), Execution time: mean = 5.034 us, total = 416.457 ms, Queueing time: mean = 86.750 us, max = 48.698 ms, min = 2.040 us, total = 7.177 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 41386 total (1 active), Execution time: mean = 16.112 us, total = 666.824 ms, Queueing time: mean = 65.948 us, max = 41.182 ms, min = -0.000 s, total = 2.729 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 33056 total (1 active), Execution time: mean = 433.971 us, total = 14.345 s, Queueing time: mean = 65.345 us, max = 27.346 ms, min = -0.000 s, total = 2.160 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8280 total (1 active), Execution time: mean = 14.874 us, total = 123.156 ms, Queueing time: mean = 62.560 us, max = 4.009 ms, min = 6.093 us, total = 517.999 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8279 total (1 active), Execution time: mean = 8.305 us, total = 68.755 ms, Queueing time: mean = 172.821 us, max = 4.336 ms, min = -0.000 s, total = 1.431 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8279 total (1 active), Execution time: mean = 3.177 us, total = 26.298 ms, Queueing time: mean = 176.169 us, max = 4.341 ms, min = 2.496 us, total = 1.459 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8277 total (0 active), Execution time: mean = 98.842 us, total = 818.115 ms, Queueing time: mean = 91.893 us, max = 2.573 ms, min = 2.559 us, total = 760.599 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8277 total (0 active), Execution time: mean = 553.968 us, total = 4.585 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2761 total (1 active), Execution time: mean = 8.015 us, total = 22.130 ms, Queueing time: mean = 69.047 us, max = 6.635 ms, min = 11.179 us, total = 190.637 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1656 total (0 active), Execution time: mean = 1.363 ms, total = 2.258 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1656 total (0 active), Execution time: mean = 50.244 us, total = 83.204 ms, Queueing time: mean = 90.692 us, max = 3.960 ms, min = 6.906 us, total = 150.186 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1656 total (1 active), Execution time: mean = 527.073 us, total = 872.834 ms, Queueing time: mean = 368.459 us, max = 2.197 ms, min = 6.917 us, total = 610.168 ms [state-dump] NodeManager.GcsCheckAlive - 1656 total (1 active), Execution time: mean = 295.720 us, total = 489.712 ms, Queueing time: mean = 599.493 us, max = 2.395 ms, min = 5.323 us, total = 992.760 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 828 total (1 active), Execution time: mean = 1.727 ms, total = 1.430 s, Queueing time: mean = 62.547 us, max = 1.632 ms, min = 10.033 us, total = 51.789 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 138 total (1 active, 1 running), Execution time: mean = 2.641 ms, total = 364.411 ms, Queueing time: mean = 62.097 us, max = 172.215 us, min = 13.784 us, total = 8.569 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 15 total (1 active), Execution time: mean = 519.917 s, total = 7798.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 14 total (0 active), Execution time: mean = 327.248 us, total = 4.581 ms, Queueing time: mean = 98.878 us, max = 410.175 us, min = 20.320 us, total = 1.384 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 10 total (1 active), Execution time: mean = 6.790 us, total = 67.903 us, Queueing time: mean = 50.768 us, max = 79.050 us, min = 26.627 us, total = 507.681 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:31:16,235 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:31:16,588 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 727801 total (35 active) [state-dump] Queueing time: mean = 4.953 ms, max = 590.169 s, min = -0.000 s, total = 3604.536 s [state-dump] Execution time: mean = 10.872 ms, total = 7912.517 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 175096 total (0 active), Execution time: mean = 453.274 us, total = 79.366 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 175096 total (0 active), Execution time: mean = 30.866 us, total = 5.404 s, Queueing time: mean = 93.209 us, max = 23.460 ms, min = 1.417 us, total = 16.320 s [state-dump] RaySyncer.OnDemandBroadcasting - 83330 total (1 active), Execution time: mean = 9.385 us, total = 782.017 ms, Queueing time: mean = 81.604 us, max = 65.085 ms, min = -0.000 s, total = 6.800 s [state-dump] NodeManager.CheckGC - 83330 total (1 active), Execution time: mean = 3.755 us, total = 312.886 ms, Queueing time: mean = 86.417 us, max = 60.039 ms, min = 3.126 us, total = 7.201 s [state-dump] ObjectManager.UpdateAvailableMemory - 83329 total (0 active), Execution time: mean = 5.034 us, total = 419.517 ms, Queueing time: mean = 86.754 us, max = 48.698 ms, min = 2.040 us, total = 7.229 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 41686 total (1 active), Execution time: mean = 16.109 us, total = 671.535 ms, Queueing time: mean = 65.978 us, max = 41.182 ms, min = -0.000 s, total = 2.750 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 33296 total (1 active), Execution time: mean = 433.950 us, total = 14.449 s, Queueing time: mean = 65.316 us, max = 27.346 ms, min = -0.000 s, total = 2.175 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8340 total (1 active), Execution time: mean = 14.865 us, total = 123.973 ms, Queueing time: mean = 62.557 us, max = 4.009 ms, min = 6.093 us, total = 521.728 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8339 total (1 active), Execution time: mean = 8.299 us, total = 69.207 ms, Queueing time: mean = 172.734 us, max = 4.336 ms, min = -0.000 s, total = 1.440 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8339 total (1 active), Execution time: mean = 3.176 us, total = 26.483 ms, Queueing time: mean = 176.079 us, max = 4.341 ms, min = 2.496 us, total = 1.468 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8337 total (0 active), Execution time: mean = 98.843 us, total = 824.050 ms, Queueing time: mean = 91.837 us, max = 2.573 ms, min = 2.559 us, total = 765.642 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8337 total (0 active), Execution time: mean = 553.969 us, total = 4.618 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2781 total (1 active), Execution time: mean = 8.016 us, total = 22.292 ms, Queueing time: mean = 69.017 us, max = 6.635 ms, min = 11.179 us, total = 191.937 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1668 total (0 active), Execution time: mean = 1.363 ms, total = 2.273 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1668 total (0 active), Execution time: mean = 50.214 us, total = 83.756 ms, Queueing time: mean = 90.735 us, max = 3.960 ms, min = 6.906 us, total = 151.346 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1668 total (1 active), Execution time: mean = 527.103 us, total = 879.208 ms, Queueing time: mean = 368.121 us, max = 2.197 ms, min = 6.917 us, total = 614.026 ms [state-dump] NodeManager.GcsCheckAlive - 1668 total (1 active), Execution time: mean = 295.494 us, total = 492.884 ms, Queueing time: mean = 599.298 us, max = 2.395 ms, min = 5.323 us, total = 999.629 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 834 total (1 active), Execution time: mean = 1.727 ms, total = 1.440 s, Queueing time: mean = 62.506 us, max = 1.632 ms, min = 10.033 us, total = 52.130 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 139 total (1 active, 1 running), Execution time: mean = 2.644 ms, total = 367.461 ms, Queueing time: mean = 61.827 us, max = 172.215 us, min = 13.784 us, total = 8.594 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 15 total (1 active), Execution time: mean = 519.917 s, total = 7798.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 14 total (0 active), Execution time: mean = 327.248 us, total = 4.581 ms, Queueing time: mean = 98.878 us, max = 410.175 us, min = 20.320 us, total = 1.384 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 10 total (1 active), Execution time: mean = 6.790 us, total = 67.903 us, Queueing time: mean = 50.768 us, max = 79.050 us, min = 26.627 us, total = 507.681 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:32:16,235 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:32:16,591 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 733034 total (35 active) [state-dump] Queueing time: mean = 4.918 ms, max = 590.169 s, min = -0.000 s, total = 3604.891 s [state-dump] Execution time: mean = 11.614 ms, total = 8513.340 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 176356 total (0 active), Execution time: mean = 453.382 us, total = 79.957 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 176356 total (0 active), Execution time: mean = 30.862 us, total = 5.443 s, Queueing time: mean = 93.256 us, max = 23.460 ms, min = 1.417 us, total = 16.446 s [state-dump] RaySyncer.OnDemandBroadcasting - 83929 total (1 active), Execution time: mean = 9.377 us, total = 787.036 ms, Queueing time: mean = 81.571 us, max = 65.085 ms, min = -0.000 s, total = 6.846 s [state-dump] NodeManager.CheckGC - 83929 total (1 active), Execution time: mean = 3.748 us, total = 314.575 ms, Queueing time: mean = 86.384 us, max = 60.039 ms, min = 3.126 us, total = 7.250 s [state-dump] ObjectManager.UpdateAvailableMemory - 83928 total (0 active), Execution time: mean = 5.034 us, total = 422.458 ms, Queueing time: mean = 86.758 us, max = 48.698 ms, min = 2.040 us, total = 7.281 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 41986 total (1 active), Execution time: mean = 16.094 us, total = 675.723 ms, Queueing time: mean = 65.976 us, max = 41.182 ms, min = -0.000 s, total = 2.770 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 33535 total (1 active), Execution time: mean = 433.931 us, total = 14.552 s, Queueing time: mean = 65.279 us, max = 27.346 ms, min = -0.000 s, total = 2.189 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8400 total (1 active), Execution time: mean = 14.853 us, total = 124.764 ms, Queueing time: mean = 62.554 us, max = 4.009 ms, min = 6.093 us, total = 525.451 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8399 total (1 active), Execution time: mean = 8.296 us, total = 69.679 ms, Queueing time: mean = 172.849 us, max = 4.336 ms, min = -0.000 s, total = 1.452 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8399 total (1 active), Execution time: mean = 3.176 us, total = 26.675 ms, Queueing time: mean = 176.193 us, max = 4.341 ms, min = 2.496 us, total = 1.480 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8397 total (0 active), Execution time: mean = 98.830 us, total = 829.877 ms, Queueing time: mean = 91.830 us, max = 2.573 ms, min = 2.559 us, total = 771.095 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8397 total (0 active), Execution time: mean = 553.943 us, total = 4.651 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2801 total (1 active), Execution time: mean = 8.014 us, total = 22.448 ms, Queueing time: mean = 68.927 us, max = 6.635 ms, min = 11.179 us, total = 193.064 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1680 total (0 active), Execution time: mean = 1.362 ms, total = 2.288 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1680 total (0 active), Execution time: mean = 50.182 us, total = 84.306 ms, Queueing time: mean = 90.594 us, max = 3.960 ms, min = 6.906 us, total = 152.198 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1680 total (1 active), Execution time: mean = 527.123 us, total = 885.567 ms, Queueing time: mean = 368.557 us, max = 2.197 ms, min = 6.917 us, total = 619.175 ms [state-dump] NodeManager.GcsCheckAlive - 1680 total (1 active), Execution time: mean = 295.495 us, total = 496.432 ms, Queueing time: mean = 599.846 us, max = 2.395 ms, min = 5.323 us, total = 1.008 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 840 total (1 active), Execution time: mean = 1.727 ms, total = 1.451 s, Queueing time: mean = 62.441 us, max = 1.632 ms, min = 10.033 us, total = 52.450 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 140 total (1 active, 1 running), Execution time: mean = 2.634 ms, total = 368.741 ms, Queueing time: mean = 61.547 us, max = 172.215 us, min = 13.784 us, total = 8.617 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 16 total (1 active), Execution time: mean = 524.923 s, total = 8398.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 15 total (0 active), Execution time: mean = 329.957 us, total = 4.949 ms, Queueing time: mean = 107.041 us, max = 410.175 us, min = 20.320 us, total = 1.606 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 10 total (1 active), Execution time: mean = 6.790 us, total = 67.903 us, Queueing time: mean = 50.768 us, max = 79.050 us, min = 26.627 us, total = 507.681 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:33:16,235 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:33:16,593 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 738269 total (35 active) [state-dump] Queueing time: mean = 4.883 ms, max = 590.169 s, min = -0.000 s, total = 3605.310 s [state-dump] Execution time: mean = 11.533 ms, total = 8514.295 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 177616 total (0 active), Execution time: mean = 454.111 us, total = 80.657 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 177616 total (0 active), Execution time: mean = 30.888 us, total = 5.486 s, Queueing time: mean = 93.467 us, max = 23.460 ms, min = 1.417 us, total = 16.601 s [state-dump] RaySyncer.OnDemandBroadcasting - 84529 total (1 active), Execution time: mean = 9.374 us, total = 792.367 ms, Queueing time: mean = 81.585 us, max = 65.085 ms, min = -0.000 s, total = 6.896 s [state-dump] NodeManager.CheckGC - 84529 total (1 active), Execution time: mean = 3.742 us, total = 316.321 ms, Queueing time: mean = 86.399 us, max = 60.039 ms, min = 3.126 us, total = 7.303 s [state-dump] ObjectManager.UpdateAvailableMemory - 84528 total (0 active), Execution time: mean = 5.039 us, total = 425.955 ms, Queueing time: mean = 86.990 us, max = 48.698 ms, min = 2.040 us, total = 7.353 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 42286 total (1 active), Execution time: mean = 16.089 us, total = 680.318 ms, Queueing time: mean = 66.037 us, max = 41.182 ms, min = -0.000 s, total = 2.792 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 33775 total (1 active), Execution time: mean = 434.007 us, total = 14.659 s, Queueing time: mean = 65.336 us, max = 27.346 ms, min = -0.000 s, total = 2.207 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8460 total (1 active), Execution time: mean = 14.852 us, total = 125.650 ms, Queueing time: mean = 62.585 us, max = 4.009 ms, min = 6.093 us, total = 529.471 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8459 total (1 active), Execution time: mean = 8.299 us, total = 70.204 ms, Queueing time: mean = 172.943 us, max = 4.336 ms, min = -0.000 s, total = 1.463 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8459 total (1 active), Execution time: mean = 3.177 us, total = 26.877 ms, Queueing time: mean = 176.288 us, max = 4.341 ms, min = 2.496 us, total = 1.491 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8457 total (0 active), Execution time: mean = 98.903 us, total = 836.421 ms, Queueing time: mean = 91.994 us, max = 2.573 ms, min = 2.559 us, total = 777.993 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8457 total (0 active), Execution time: mean = 554.541 us, total = 4.690 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2821 total (1 active), Execution time: mean = 8.019 us, total = 22.622 ms, Queueing time: mean = 68.963 us, max = 6.635 ms, min = 11.179 us, total = 194.544 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1692 total (0 active), Execution time: mean = 1.363 ms, total = 2.305 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1692 total (0 active), Execution time: mean = 50.229 us, total = 84.987 ms, Queueing time: mean = 90.704 us, max = 3.960 ms, min = 6.906 us, total = 153.472 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1692 total (1 active), Execution time: mean = 527.824 us, total = 893.078 ms, Queueing time: mean = 368.331 us, max = 2.197 ms, min = 6.917 us, total = 623.216 ms [state-dump] NodeManager.GcsCheckAlive - 1692 total (1 active), Execution time: mean = 295.589 us, total = 500.136 ms, Queueing time: mean = 600.228 us, max = 2.395 ms, min = 5.323 us, total = 1.016 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 846 total (1 active), Execution time: mean = 1.728 ms, total = 1.462 s, Queueing time: mean = 62.531 us, max = 1.632 ms, min = 10.033 us, total = 52.901 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 141 total (1 active, 1 running), Execution time: mean = 2.634 ms, total = 371.393 ms, Queueing time: mean = 61.575 us, max = 172.215 us, min = 13.784 us, total = 8.682 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 16 total (1 active), Execution time: mean = 524.923 s, total = 8398.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 15 total (0 active), Execution time: mean = 329.957 us, total = 4.949 ms, Queueing time: mean = 107.041 us, max = 410.175 us, min = 20.320 us, total = 1.606 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 10 total (1 active), Execution time: mean = 6.790 us, total = 67.903 us, Queueing time: mean = 50.768 us, max = 79.050 us, min = 26.627 us, total = 507.681 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:34:16,236 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:34:16,595 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 743501 total (35 active) [state-dump] Queueing time: mean = 4.850 ms, max = 590.169 s, min = -0.000 s, total = 3605.740 s [state-dump] Execution time: mean = 11.453 ms, total = 8515.238 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 178876 total (0 active), Execution time: mean = 454.776 us, total = 81.348 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 178876 total (0 active), Execution time: mean = 30.909 us, total = 5.529 s, Queueing time: mean = 93.660 us, max = 23.460 ms, min = 1.417 us, total = 16.753 s [state-dump] RaySyncer.OnDemandBroadcasting - 85128 total (1 active), Execution time: mean = 9.371 us, total = 797.758 ms, Queueing time: mean = 81.639 us, max = 65.085 ms, min = -0.000 s, total = 6.950 s [state-dump] NodeManager.CheckGC - 85128 total (1 active), Execution time: mean = 3.736 us, total = 318.068 ms, Queueing time: mean = 86.455 us, max = 60.039 ms, min = 3.126 us, total = 7.360 s [state-dump] ObjectManager.UpdateAvailableMemory - 85127 total (0 active), Execution time: mean = 5.046 us, total = 429.512 ms, Queueing time: mean = 87.233 us, max = 48.698 ms, min = 2.040 us, total = 7.426 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 42586 total (1 active), Execution time: mean = 16.085 us, total = 685.001 ms, Queueing time: mean = 66.130 us, max = 41.182 ms, min = -0.000 s, total = 2.816 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 34015 total (1 active), Execution time: mean = 434.053 us, total = 14.764 s, Queueing time: mean = 65.417 us, max = 27.346 ms, min = -0.000 s, total = 2.225 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8520 total (1 active), Execution time: mean = 14.874 us, total = 126.729 ms, Queueing time: mean = 62.670 us, max = 4.009 ms, min = 6.093 us, total = 533.946 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8519 total (1 active), Execution time: mean = 8.301 us, total = 70.719 ms, Queueing time: mean = 173.082 us, max = 4.336 ms, min = -0.000 s, total = 1.474 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8519 total (1 active), Execution time: mean = 3.178 us, total = 27.072 ms, Queueing time: mean = 176.428 us, max = 4.341 ms, min = 2.496 us, total = 1.503 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8517 total (0 active), Execution time: mean = 98.958 us, total = 842.828 ms, Queueing time: mean = 92.177 us, max = 2.573 ms, min = 2.559 us, total = 785.069 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8517 total (0 active), Execution time: mean = 555.141 us, total = 4.728 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2841 total (1 active), Execution time: mean = 8.021 us, total = 22.788 ms, Queueing time: mean = 69.000 us, max = 6.635 ms, min = 11.179 us, total = 196.028 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1704 total (0 active), Execution time: mean = 1.363 ms, total = 2.323 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1704 total (0 active), Execution time: mean = 50.277 us, total = 85.672 ms, Queueing time: mean = 90.919 us, max = 3.960 ms, min = 6.906 us, total = 154.926 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1704 total (1 active), Execution time: mean = 527.794 us, total = 899.362 ms, Queueing time: mean = 369.058 us, max = 2.197 ms, min = 6.917 us, total = 628.874 ms [state-dump] NodeManager.GcsCheckAlive - 1704 total (1 active), Execution time: mean = 295.606 us, total = 503.713 ms, Queueing time: mean = 600.926 us, max = 2.395 ms, min = 5.323 us, total = 1.024 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 852 total (1 active), Execution time: mean = 1.729 ms, total = 1.473 s, Queueing time: mean = 62.874 us, max = 1.632 ms, min = 10.033 us, total = 53.568 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 142 total (1 active, 1 running), Execution time: mean = 2.626 ms, total = 372.950 ms, Queueing time: mean = 61.505 us, max = 172.215 us, min = 13.784 us, total = 8.734 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 16 total (1 active), Execution time: mean = 524.923 s, total = 8398.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 15 total (0 active), Execution time: mean = 329.957 us, total = 4.949 ms, Queueing time: mean = 107.041 us, max = 410.175 us, min = 20.320 us, total = 1.606 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 10 total (1 active), Execution time: mean = 6.790 us, total = 67.903 us, Queueing time: mean = 50.768 us, max = 79.050 us, min = 26.627 us, total = 507.681 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:35:16,236 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:35:16,598 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 748734 total (35 active) [state-dump] Queueing time: mean = 4.816 ms, max = 590.169 s, min = -0.000 s, total = 3606.170 s [state-dump] Execution time: mean = 11.374 ms, total = 8516.191 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 180136 total (0 active), Execution time: mean = 455.464 us, total = 82.045 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 180136 total (0 active), Execution time: mean = 30.944 us, total = 5.574 s, Queueing time: mean = 93.859 us, max = 23.460 ms, min = 1.417 us, total = 16.907 s [state-dump] RaySyncer.OnDemandBroadcasting - 85728 total (1 active), Execution time: mean = 9.369 us, total = 803.172 ms, Queueing time: mean = 81.707 us, max = 65.085 ms, min = -0.000 s, total = 7.005 s [state-dump] NodeManager.CheckGC - 85728 total (1 active), Execution time: mean = 3.731 us, total = 319.851 ms, Queueing time: mean = 86.526 us, max = 60.039 ms, min = 3.126 us, total = 7.418 s [state-dump] ObjectManager.UpdateAvailableMemory - 85727 total (0 active), Execution time: mean = 5.052 us, total = 433.081 ms, Queueing time: mean = 87.486 us, max = 48.698 ms, min = 2.040 us, total = 7.500 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 42885 total (1 active), Execution time: mean = 16.084 us, total = 689.744 ms, Queueing time: mean = 66.200 us, max = 41.182 ms, min = -0.000 s, total = 2.839 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 34254 total (1 active), Execution time: mean = 434.106 us, total = 14.870 s, Queueing time: mean = 65.546 us, max = 27.346 ms, min = -0.000 s, total = 2.245 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8580 total (1 active), Execution time: mean = 14.888 us, total = 127.736 ms, Queueing time: mean = 62.723 us, max = 4.009 ms, min = 6.093 us, total = 538.164 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8579 total (1 active), Execution time: mean = 8.305 us, total = 71.246 ms, Queueing time: mean = 173.115 us, max = 4.336 ms, min = -0.000 s, total = 1.485 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8579 total (1 active), Execution time: mean = 3.179 us, total = 27.275 ms, Queueing time: mean = 176.462 us, max = 4.341 ms, min = 2.496 us, total = 1.514 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8577 total (0 active), Execution time: mean = 99.013 us, total = 849.231 ms, Queueing time: mean = 92.333 us, max = 2.573 ms, min = 2.559 us, total = 791.938 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8577 total (0 active), Execution time: mean = 555.866 us, total = 4.768 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2861 total (1 active), Execution time: mean = 8.025 us, total = 22.960 ms, Queueing time: mean = 69.005 us, max = 6.635 ms, min = 11.179 us, total = 197.423 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1716 total (0 active), Execution time: mean = 1.364 ms, total = 2.340 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1716 total (0 active), Execution time: mean = 50.295 us, total = 86.307 ms, Queueing time: mean = 91.050 us, max = 3.960 ms, min = 6.906 us, total = 156.241 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1716 total (1 active), Execution time: mean = 528.420 us, total = 906.769 ms, Queueing time: mean = 368.584 us, max = 2.197 ms, min = 6.917 us, total = 632.490 ms [state-dump] NodeManager.GcsCheckAlive - 1716 total (1 active), Execution time: mean = 295.617 us, total = 507.278 ms, Queueing time: mean = 601.079 us, max = 2.395 ms, min = 5.323 us, total = 1.031 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 858 total (1 active), Execution time: mean = 1.729 ms, total = 1.484 s, Queueing time: mean = 62.927 us, max = 1.632 ms, min = 10.033 us, total = 53.992 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 143 total (1 active, 1 running), Execution time: mean = 2.627 ms, total = 375.624 ms, Queueing time: mean = 62.649 us, max = 225.125 us, min = 13.784 us, total = 8.959 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 16 total (1 active), Execution time: mean = 524.923 s, total = 8398.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 15 total (0 active), Execution time: mean = 329.957 us, total = 4.949 ms, Queueing time: mean = 107.041 us, max = 410.175 us, min = 20.320 us, total = 1.606 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 10 total (1 active), Execution time: mean = 6.790 us, total = 67.903 us, Queueing time: mean = 50.768 us, max = 79.050 us, min = 26.627 us, total = 507.681 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 07:36:16,236 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:36:16,600 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 753966 total (35 active) [state-dump] Queueing time: mean = 4.783 ms, max = 590.169 s, min = -0.000 s, total = 3606.593 s [state-dump] Execution time: mean = 11.296 ms, total = 8517.149 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 181396 total (0 active), Execution time: mean = 456.187 us, total = 82.751 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 181396 total (0 active), Execution time: mean = 30.972 us, total = 5.618 s, Queueing time: mean = 94.066 us, max = 23.460 ms, min = 1.417 us, total = 17.063 s [state-dump] RaySyncer.OnDemandBroadcasting - 86327 total (1 active), Execution time: mean = 9.366 us, total = 808.552 ms, Queueing time: mean = 81.763 us, max = 65.085 ms, min = -0.000 s, total = 7.058 s [state-dump] NodeManager.CheckGC - 86327 total (1 active), Execution time: mean = 3.725 us, total = 321.601 ms, Queueing time: mean = 86.583 us, max = 60.039 ms, min = 3.126 us, total = 7.474 s [state-dump] ObjectManager.UpdateAvailableMemory - 86326 total (0 active), Execution time: mean = 5.057 us, total = 436.587 ms, Queueing time: mean = 87.707 us, max = 48.698 ms, min = 2.040 us, total = 7.571 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 43185 total (1 active), Execution time: mean = 16.078 us, total = 694.333 ms, Queueing time: mean = 66.230 us, max = 41.182 ms, min = -0.000 s, total = 2.860 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 34494 total (1 active), Execution time: mean = 434.166 us, total = 14.976 s, Queueing time: mean = 65.590 us, max = 27.346 ms, min = -0.000 s, total = 2.262 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8640 total (1 active), Execution time: mean = 14.900 us, total = 128.739 ms, Queueing time: mean = 62.775 us, max = 4.009 ms, min = 6.093 us, total = 542.380 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8639 total (1 active), Execution time: mean = 8.309 us, total = 71.782 ms, Queueing time: mean = 173.094 us, max = 4.336 ms, min = -0.000 s, total = 1.495 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8639 total (1 active), Execution time: mean = 3.179 us, total = 27.467 ms, Queueing time: mean = 176.442 us, max = 4.341 ms, min = 2.496 us, total = 1.524 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8637 total (0 active), Execution time: mean = 99.069 us, total = 855.660 ms, Queueing time: mean = 92.506 us, max = 2.573 ms, min = 2.559 us, total = 798.973 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8637 total (0 active), Execution time: mean = 556.488 us, total = 4.806 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2881 total (1 active), Execution time: mean = 8.027 us, total = 23.127 ms, Queueing time: mean = 69.029 us, max = 6.635 ms, min = 11.179 us, total = 198.871 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1728 total (0 active), Execution time: mean = 1.364 ms, total = 2.357 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1728 total (0 active), Execution time: mean = 50.319 us, total = 86.951 ms, Queueing time: mean = 91.268 us, max = 3.960 ms, min = 6.906 us, total = 157.711 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1728 total (1 active), Execution time: mean = 528.308 us, total = 912.916 ms, Queueing time: mean = 368.590 us, max = 2.197 ms, min = 6.917 us, total = 636.924 ms [state-dump] NodeManager.GcsCheckAlive - 1728 total (1 active), Execution time: mean = 295.614 us, total = 510.822 ms, Queueing time: mean = 600.965 us, max = 2.395 ms, min = 5.323 us, total = 1.038 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 864 total (1 active), Execution time: mean = 1.729 ms, total = 1.494 s, Queueing time: mean = 62.923 us, max = 1.632 ms, min = 10.033 us, total = 54.366 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 144 total (1 active, 1 running), Execution time: mean = 2.627 ms, total = 378.303 ms, Queueing time: mean = 62.655 us, max = 225.125 us, min = 13.784 us, total = 9.022 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 16 total (1 active), Execution time: mean = 524.923 s, total = 8398.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 15 total (0 active), Execution time: mean = 329.957 us, total = 4.949 ms, Queueing time: mean = 107.041 us, max = 410.175 us, min = 20.320 us, total = 1.606 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 10 total (1 active), Execution time: mean = 6.790 us, total = 67.903 us, Queueing time: mean = 50.768 us, max = 79.050 us, min = 26.627 us, total = 507.681 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:37:16,237 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:37:16,604 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 759159 total (35 active) [state-dump] Queueing time: mean = 4.751 ms, max = 590.169 s, min = -0.000 s, total = 3607.011 s [state-dump] Execution time: mean = 11.220 ms, total = 8518.093 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 182637 total (0 active), Execution time: mean = 456.835 us, total = 83.435 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 182637 total (0 active), Execution time: mean = 31.009 us, total = 5.663 s, Queueing time: mean = 94.267 us, max = 23.460 ms, min = 1.417 us, total = 17.217 s [state-dump] RaySyncer.OnDemandBroadcasting - 86926 total (1 active), Execution time: mean = 9.369 us, total = 814.446 ms, Queueing time: mean = 81.808 us, max = 65.085 ms, min = -0.000 s, total = 7.111 s [state-dump] NodeManager.CheckGC - 86926 total (1 active), Execution time: mean = 3.722 us, total = 323.513 ms, Queueing time: mean = 86.634 us, max = 60.039 ms, min = 3.126 us, total = 7.531 s [state-dump] ObjectManager.UpdateAvailableMemory - 86925 total (0 active), Execution time: mean = 5.063 us, total = 440.135 ms, Queueing time: mean = 87.842 us, max = 48.698 ms, min = 2.040 us, total = 7.636 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 43485 total (1 active), Execution time: mean = 16.083 us, total = 699.368 ms, Queueing time: mean = 66.307 us, max = 41.182 ms, min = -0.000 s, total = 2.883 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 34733 total (1 active), Execution time: mean = 434.292 us, total = 15.084 s, Queueing time: mean = 65.662 us, max = 27.346 ms, min = -0.000 s, total = 2.281 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8700 total (1 active), Execution time: mean = 14.907 us, total = 129.690 ms, Queueing time: mean = 62.852 us, max = 4.009 ms, min = 6.093 us, total = 546.815 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8699 total (1 active), Execution time: mean = 8.312 us, total = 72.307 ms, Queueing time: mean = 173.175 us, max = 4.336 ms, min = -0.000 s, total = 1.506 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8699 total (1 active), Execution time: mean = 3.181 us, total = 27.668 ms, Queueing time: mean = 176.524 us, max = 4.341 ms, min = 2.496 us, total = 1.536 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8697 total (0 active), Execution time: mean = 99.128 us, total = 862.114 ms, Queueing time: mean = 92.680 us, max = 2.573 ms, min = 2.559 us, total = 806.035 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8697 total (0 active), Execution time: mean = 557.111 us, total = 4.845 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2901 total (1 active), Execution time: mean = 8.034 us, total = 23.306 ms, Queueing time: mean = 69.049 us, max = 6.635 ms, min = 11.179 us, total = 200.310 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1740 total (0 active), Execution time: mean = 1.365 ms, total = 2.375 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1740 total (0 active), Execution time: mean = 50.323 us, total = 87.562 ms, Queueing time: mean = 91.431 us, max = 3.960 ms, min = 6.906 us, total = 159.090 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1740 total (1 active), Execution time: mean = 528.510 us, total = 919.608 ms, Queueing time: mean = 368.821 us, max = 2.197 ms, min = 6.917 us, total = 641.749 ms [state-dump] NodeManager.GcsCheckAlive - 1740 total (1 active), Execution time: mean = 295.697 us, total = 514.512 ms, Queueing time: mean = 601.306 us, max = 2.395 ms, min = 5.323 us, total = 1.046 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 870 total (1 active), Execution time: mean = 1.730 ms, total = 1.505 s, Queueing time: mean = 62.945 us, max = 1.632 ms, min = 10.033 us, total = 54.763 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 145 total (1 active, 1 running), Execution time: mean = 2.624 ms, total = 380.506 ms, Queueing time: mean = 62.563 us, max = 225.125 us, min = 13.784 us, total = 9.072 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 16 total (1 active), Execution time: mean = 524.923 s, total = 8398.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 15 total (0 active), Execution time: mean = 329.957 us, total = 4.949 ms, Queueing time: mean = 107.041 us, max = 410.175 us, min = 20.320 us, total = 1.606 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 10 total (1 active), Execution time: mean = 6.790 us, total = 67.903 us, Queueing time: mean = 50.768 us, max = 79.050 us, min = 26.627 us, total = 507.681 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 07:38:16,237 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:38:16,606 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 764394 total (35 active) [state-dump] Queueing time: mean = 4.719 ms, max = 590.169 s, min = -0.000 s, total = 3607.448 s [state-dump] Execution time: mean = 11.145 ms, total = 8519.067 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 183897 total (0 active), Execution time: mean = 457.587 us, total = 84.149 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 183897 total (0 active), Execution time: mean = 31.050 us, total = 5.710 s, Queueing time: mean = 94.494 us, max = 23.460 ms, min = 1.417 us, total = 17.377 s [state-dump] RaySyncer.OnDemandBroadcasting - 87526 total (1 active), Execution time: mean = 9.370 us, total = 820.129 ms, Queueing time: mean = 81.899 us, max = 65.085 ms, min = -0.000 s, total = 7.168 s [state-dump] NodeManager.CheckGC - 87526 total (1 active), Execution time: mean = 3.717 us, total = 325.325 ms, Queueing time: mean = 86.730 us, max = 60.039 ms, min = 3.126 us, total = 7.591 s [state-dump] ObjectManager.UpdateAvailableMemory - 87525 total (0 active), Execution time: mean = 5.070 us, total = 443.743 ms, Queueing time: mean = 88.055 us, max = 48.698 ms, min = 2.040 us, total = 7.707 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 43785 total (1 active), Execution time: mean = 16.088 us, total = 704.435 ms, Queueing time: mean = 66.363 us, max = 41.182 ms, min = -0.000 s, total = 2.906 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 34973 total (1 active), Execution time: mean = 434.411 us, total = 15.193 s, Queueing time: mean = 65.694 us, max = 27.346 ms, min = -0.000 s, total = 2.297 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8760 total (1 active), Execution time: mean = 14.931 us, total = 130.796 ms, Queueing time: mean = 62.901 us, max = 4.009 ms, min = 6.093 us, total = 551.009 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8759 total (1 active), Execution time: mean = 8.314 us, total = 72.825 ms, Queueing time: mean = 173.231 us, max = 4.336 ms, min = -0.000 s, total = 1.517 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8759 total (1 active), Execution time: mean = 3.181 us, total = 27.860 ms, Queueing time: mean = 176.582 us, max = 4.341 ms, min = 2.496 us, total = 1.547 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8757 total (0 active), Execution time: mean = 99.167 us, total = 868.407 ms, Queueing time: mean = 92.779 us, max = 2.573 ms, min = 2.559 us, total = 812.469 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8757 total (0 active), Execution time: mean = 557.714 us, total = 4.884 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2921 total (1 active), Execution time: mean = 8.041 us, total = 23.488 ms, Queueing time: mean = 69.089 us, max = 6.635 ms, min = 11.179 us, total = 201.810 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1752 total (0 active), Execution time: mean = 1.366 ms, total = 2.393 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1752 total (0 active), Execution time: mean = 50.355 us, total = 88.222 ms, Queueing time: mean = 91.605 us, max = 3.960 ms, min = 6.906 us, total = 160.491 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1752 total (1 active), Execution time: mean = 528.748 us, total = 926.366 ms, Queueing time: mean = 368.892 us, max = 2.197 ms, min = 6.917 us, total = 646.299 ms [state-dump] NodeManager.GcsCheckAlive - 1752 total (1 active), Execution time: mean = 295.693 us, total = 518.053 ms, Queueing time: mean = 601.616 us, max = 2.395 ms, min = 5.323 us, total = 1.054 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 876 total (1 active), Execution time: mean = 1.731 ms, total = 1.516 s, Queueing time: mean = 62.993 us, max = 1.632 ms, min = 10.033 us, total = 55.182 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 146 total (1 active, 1 running), Execution time: mean = 2.626 ms, total = 383.448 ms, Queueing time: mean = 62.623 us, max = 225.125 us, min = 13.784 us, total = 9.143 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 16 total (1 active), Execution time: mean = 524.923 s, total = 8398.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 15 total (0 active), Execution time: mean = 329.957 us, total = 4.949 ms, Queueing time: mean = 107.041 us, max = 410.175 us, min = 20.320 us, total = 1.606 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 10 total (1 active), Execution time: mean = 6.790 us, total = 67.903 us, Queueing time: mean = 50.768 us, max = 79.050 us, min = 26.627 us, total = 507.681 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 07:39:16,237 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:39:16,609 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 769625 total (35 active) [state-dump] Queueing time: mean = 4.688 ms, max = 590.169 s, min = -0.000 s, total = 3607.869 s [state-dump] Execution time: mean = 11.070 ms, total = 8520.019 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 185157 total (0 active), Execution time: mean = 458.246 us, total = 84.847 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 185157 total (0 active), Execution time: mean = 31.079 us, total = 5.754 s, Queueing time: mean = 94.657 us, max = 23.460 ms, min = 1.417 us, total = 17.526 s [state-dump] RaySyncer.OnDemandBroadcasting - 88125 total (1 active), Execution time: mean = 9.368 us, total = 825.590 ms, Queueing time: mean = 81.948 us, max = 65.085 ms, min = -0.000 s, total = 7.222 s [state-dump] NodeManager.CheckGC - 88125 total (1 active), Execution time: mean = 3.712 us, total = 327.087 ms, Queueing time: mean = 86.782 us, max = 60.039 ms, min = 3.126 us, total = 7.648 s [state-dump] ObjectManager.UpdateAvailableMemory - 88124 total (0 active), Execution time: mean = 5.075 us, total = 447.262 ms, Queueing time: mean = 88.260 us, max = 48.698 ms, min = 2.040 us, total = 7.778 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 44085 total (1 active), Execution time: mean = 16.086 us, total = 709.134 ms, Queueing time: mean = 66.403 us, max = 41.182 ms, min = -0.000 s, total = 2.927 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 35212 total (1 active), Execution time: mean = 434.446 us, total = 15.298 s, Queueing time: mean = 65.786 us, max = 27.346 ms, min = -0.000 s, total = 2.316 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8820 total (1 active), Execution time: mean = 14.941 us, total = 131.783 ms, Queueing time: mean = 62.965 us, max = 4.009 ms, min = 6.093 us, total = 555.349 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8819 total (1 active), Execution time: mean = 8.322 us, total = 73.392 ms, Queueing time: mean = 173.308 us, max = 4.336 ms, min = -0.000 s, total = 1.528 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8819 total (1 active), Execution time: mean = 3.182 us, total = 28.059 ms, Queueing time: mean = 176.661 us, max = 4.341 ms, min = 2.496 us, total = 1.558 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8817 total (0 active), Execution time: mean = 99.205 us, total = 874.688 ms, Queueing time: mean = 92.970 us, max = 2.573 ms, min = 2.559 us, total = 819.719 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8817 total (0 active), Execution time: mean = 558.359 us, total = 4.923 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2941 total (1 active), Execution time: mean = 8.043 us, total = 23.654 ms, Queueing time: mean = 69.086 us, max = 6.635 ms, min = 11.179 us, total = 203.183 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1764 total (0 active), Execution time: mean = 1.366 ms, total = 2.410 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1764 total (0 active), Execution time: mean = 50.384 us, total = 88.878 ms, Queueing time: mean = 91.740 us, max = 3.960 ms, min = 6.906 us, total = 161.830 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1764 total (1 active), Execution time: mean = 528.930 us, total = 933.033 ms, Queueing time: mean = 369.096 us, max = 2.197 ms, min = 6.917 us, total = 651.086 ms [state-dump] NodeManager.GcsCheckAlive - 1764 total (1 active), Execution time: mean = 295.678 us, total = 521.576 ms, Queueing time: mean = 602.031 us, max = 2.395 ms, min = 5.323 us, total = 1.062 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 882 total (1 active), Execution time: mean = 1.731 ms, total = 1.527 s, Queueing time: mean = 63.091 us, max = 1.632 ms, min = 10.033 us, total = 55.646 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 147 total (1 active, 1 running), Execution time: mean = 2.627 ms, total = 386.177 ms, Queueing time: mean = 62.604 us, max = 225.125 us, min = 13.784 us, total = 9.203 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 16 total (1 active), Execution time: mean = 524.923 s, total = 8398.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 15 total (0 active), Execution time: mean = 329.957 us, total = 4.949 ms, Queueing time: mean = 107.041 us, max = 410.175 us, min = 20.320 us, total = 1.606 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 10 total (1 active), Execution time: mean = 6.790 us, total = 67.903 us, Queueing time: mean = 50.768 us, max = 79.050 us, min = 26.627 us, total = 507.681 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 07:40:16,237 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:40:16,611 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 774860 total (35 active) [state-dump] Queueing time: mean = 4.657 ms, max = 590.169 s, min = -0.000 s, total = 3608.288 s [state-dump] Execution time: mean = 10.997 ms, total = 8520.969 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 186417 total (0 active), Execution time: mean = 458.842 us, total = 85.536 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 186417 total (0 active), Execution time: mean = 31.108 us, total = 5.799 s, Queueing time: mean = 94.822 us, max = 23.460 ms, min = 1.417 us, total = 17.677 s [state-dump] RaySyncer.OnDemandBroadcasting - 88725 total (1 active), Execution time: mean = 9.372 us, total = 831.494 ms, Queueing time: mean = 81.993 us, max = 65.085 ms, min = -0.000 s, total = 7.275 s [state-dump] NodeManager.CheckGC - 88725 total (1 active), Execution time: mean = 3.708 us, total = 328.957 ms, Queueing time: mean = 86.833 us, max = 60.039 ms, min = 3.126 us, total = 7.704 s [state-dump] ObjectManager.UpdateAvailableMemory - 88724 total (0 active), Execution time: mean = 5.083 us, total = 450.975 ms, Queueing time: mean = 88.466 us, max = 48.698 ms, min = 2.040 us, total = 7.849 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 44385 total (1 active), Execution time: mean = 16.101 us, total = 714.651 ms, Queueing time: mean = 66.494 us, max = 41.182 ms, min = -0.000 s, total = 2.951 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 35452 total (1 active), Execution time: mean = 434.580 us, total = 15.407 s, Queueing time: mean = 65.844 us, max = 27.346 ms, min = -0.000 s, total = 2.334 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8880 total (1 active), Execution time: mean = 14.952 us, total = 132.778 ms, Queueing time: mean = 63.005 us, max = 4.009 ms, min = 6.093 us, total = 559.481 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8879 total (1 active), Execution time: mean = 8.326 us, total = 73.922 ms, Queueing time: mean = 173.343 us, max = 4.336 ms, min = -0.000 s, total = 1.539 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8879 total (1 active), Execution time: mean = 3.183 us, total = 28.262 ms, Queueing time: mean = 176.697 us, max = 4.341 ms, min = 2.496 us, total = 1.569 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8877 total (0 active), Execution time: mean = 99.265 us, total = 881.172 ms, Queueing time: mean = 93.060 us, max = 2.573 ms, min = 2.559 us, total = 826.095 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8877 total (0 active), Execution time: mean = 558.929 us, total = 4.962 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2961 total (1 active), Execution time: mean = 8.060 us, total = 23.865 ms, Queueing time: mean = 69.126 us, max = 6.635 ms, min = 11.179 us, total = 204.682 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1776 total (0 active), Execution time: mean = 1.366 ms, total = 2.427 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1776 total (0 active), Execution time: mean = 50.397 us, total = 89.504 ms, Queueing time: mean = 91.997 us, max = 3.960 ms, min = 6.906 us, total = 163.387 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1776 total (1 active), Execution time: mean = 529.753 us, total = 940.841 ms, Queueing time: mean = 368.498 us, max = 2.197 ms, min = 6.917 us, total = 654.453 ms [state-dump] NodeManager.GcsCheckAlive - 1776 total (1 active), Execution time: mean = 295.752 us, total = 525.255 ms, Queueing time: mean = 602.166 us, max = 2.395 ms, min = 5.323 us, total = 1.069 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 888 total (1 active), Execution time: mean = 1.732 ms, total = 1.538 s, Queueing time: mean = 63.128 us, max = 1.632 ms, min = 10.033 us, total = 56.058 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 148 total (1 active, 1 running), Execution time: mean = 2.629 ms, total = 389.156 ms, Queueing time: mean = 62.648 us, max = 225.125 us, min = 13.784 us, total = 9.272 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 16 total (1 active), Execution time: mean = 524.923 s, total = 8398.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 15 total (0 active), Execution time: mean = 329.957 us, total = 4.949 ms, Queueing time: mean = 107.041 us, max = 410.175 us, min = 20.320 us, total = 1.606 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 10 total (1 active), Execution time: mean = 6.790 us, total = 67.903 us, Queueing time: mean = 50.768 us, max = 79.050 us, min = 26.627 us, total = 507.681 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:41:16,238 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:41:16,614 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 780091 total (35 active) [state-dump] Queueing time: mean = 4.626 ms, max = 590.169 s, min = -0.000 s, total = 3608.604 s [state-dump] Execution time: mean = 10.924 ms, total = 8521.702 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 187677 total (0 active), Execution time: mean = 458.466 us, total = 86.044 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 187677 total (0 active), Execution time: mean = 31.080 us, total = 5.833 s, Queueing time: mean = 94.740 us, max = 23.460 ms, min = 1.417 us, total = 17.780 s [state-dump] RaySyncer.OnDemandBroadcasting - 89324 total (1 active), Execution time: mean = 9.367 us, total = 836.738 ms, Queueing time: mean = 81.934 us, max = 65.085 ms, min = -0.000 s, total = 7.319 s [state-dump] NodeManager.CheckGC - 89324 total (1 active), Execution time: mean = 3.702 us, total = 330.645 ms, Queueing time: mean = 86.776 us, max = 60.039 ms, min = 3.126 us, total = 7.751 s [state-dump] ObjectManager.UpdateAvailableMemory - 89323 total (0 active), Execution time: mean = 5.081 us, total = 453.841 ms, Queueing time: mean = 88.415 us, max = 48.698 ms, min = 2.040 us, total = 7.897 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 44685 total (1 active), Execution time: mean = 16.095 us, total = 719.217 ms, Queueing time: mean = 66.426 us, max = 41.182 ms, min = -0.000 s, total = 2.968 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 35691 total (1 active), Execution time: mean = 434.475 us, total = 15.507 s, Queueing time: mean = 65.807 us, max = 27.346 ms, min = -0.000 s, total = 2.349 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8940 total (1 active), Execution time: mean = 14.955 us, total = 133.696 ms, Queueing time: mean = 62.988 us, max = 4.009 ms, min = 6.093 us, total = 563.111 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8939 total (1 active), Execution time: mean = 8.321 us, total = 74.383 ms, Queueing time: mean = 173.276 us, max = 4.336 ms, min = -0.000 s, total = 1.549 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8939 total (1 active), Execution time: mean = 3.181 us, total = 28.437 ms, Queueing time: mean = 176.629 us, max = 4.341 ms, min = 2.496 us, total = 1.579 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8937 total (0 active), Execution time: mean = 99.341 us, total = 887.806 ms, Queueing time: mean = 93.037 us, max = 2.573 ms, min = 2.559 us, total = 831.474 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8937 total (0 active), Execution time: mean = 558.940 us, total = 4.995 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2981 total (1 active), Execution time: mean = 8.053 us, total = 24.006 ms, Queueing time: mean = 69.030 us, max = 6.635 ms, min = 11.179 us, total = 205.777 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1788 total (0 active), Execution time: mean = 1.365 ms, total = 2.441 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1788 total (0 active), Execution time: mean = 50.379 us, total = 90.079 ms, Queueing time: mean = 91.966 us, max = 3.960 ms, min = 6.906 us, total = 164.436 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1788 total (1 active), Execution time: mean = 529.543 us, total = 946.824 ms, Queueing time: mean = 368.283 us, max = 2.197 ms, min = 6.917 us, total = 658.491 ms [state-dump] NodeManager.GcsCheckAlive - 1788 total (1 active), Execution time: mean = 295.604 us, total = 528.541 ms, Queueing time: mean = 601.875 us, max = 2.395 ms, min = 5.323 us, total = 1.076 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 894 total (1 active), Execution time: mean = 1.731 ms, total = 1.548 s, Queueing time: mean = 63.027 us, max = 1.632 ms, min = 10.033 us, total = 56.346 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 149 total (1 active, 1 running), Execution time: mean = 2.622 ms, total = 390.674 ms, Queueing time: mean = 62.472 us, max = 225.125 us, min = 13.784 us, total = 9.308 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 16 total (1 active), Execution time: mean = 524.923 s, total = 8398.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 15 total (0 active), Execution time: mean = 329.957 us, total = 4.949 ms, Queueing time: mean = 107.041 us, max = 410.175 us, min = 20.320 us, total = 1.606 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 10 total (1 active), Execution time: mean = 6.790 us, total = 67.903 us, Queueing time: mean = 50.768 us, max = 79.050 us, min = 26.627 us, total = 507.681 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:42:16,238 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:42:16,617 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 785329 total (35 active) [state-dump] Queueing time: mean = 4.595 ms, max = 590.169 s, min = -0.000 s, total = 3608.922 s [state-dump] Execution time: mean = 11.616 ms, total = 9122.394 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 188937 total (0 active), Execution time: mean = 457.898 us, total = 86.514 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 188937 total (0 active), Execution time: mean = 31.039 us, total = 5.864 s, Queueing time: mean = 94.692 us, max = 23.460 ms, min = 1.417 us, total = 17.891 s [state-dump] RaySyncer.OnDemandBroadcasting - 89924 total (1 active), Execution time: mean = 9.364 us, total = 842.004 ms, Queueing time: mean = 81.881 us, max = 65.085 ms, min = -0.000 s, total = 7.363 s [state-dump] NodeManager.CheckGC - 89924 total (1 active), Execution time: mean = 3.697 us, total = 332.436 ms, Queueing time: mean = 86.724 us, max = 60.039 ms, min = 3.126 us, total = 7.799 s [state-dump] ObjectManager.UpdateAvailableMemory - 89923 total (0 active), Execution time: mean = 5.077 us, total = 456.557 ms, Queueing time: mean = 88.314 us, max = 48.698 ms, min = 2.040 us, total = 7.941 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 44985 total (1 active), Execution time: mean = 16.084 us, total = 723.520 ms, Queueing time: mean = 66.345 us, max = 41.182 ms, min = -0.000 s, total = 2.985 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 35931 total (1 active), Execution time: mean = 434.344 us, total = 15.606 s, Queueing time: mean = 65.746 us, max = 27.346 ms, min = -0.000 s, total = 2.362 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 9000 total (1 active), Execution time: mean = 14.953 us, total = 134.577 ms, Queueing time: mean = 62.952 us, max = 4.009 ms, min = 6.093 us, total = 566.571 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8999 total (1 active), Execution time: mean = 8.314 us, total = 74.822 ms, Queueing time: mean = 173.224 us, max = 4.336 ms, min = -0.000 s, total = 1.559 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8999 total (1 active), Execution time: mean = 3.179 us, total = 28.612 ms, Queueing time: mean = 176.574 us, max = 4.341 ms, min = 2.496 us, total = 1.589 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8997 total (0 active), Execution time: mean = 99.321 us, total = 893.591 ms, Queueing time: mean = 92.962 us, max = 2.573 ms, min = 2.559 us, total = 836.382 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8997 total (0 active), Execution time: mean = 558.551 us, total = 5.025 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 3001 total (1 active), Execution time: mean = 8.057 us, total = 24.178 ms, Queueing time: mean = 68.934 us, max = 6.635 ms, min = 11.179 us, total = 206.872 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1800 total (0 active), Execution time: mean = 1.364 ms, total = 2.456 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1800 total (0 active), Execution time: mean = 50.352 us, total = 90.633 ms, Queueing time: mean = 91.810 us, max = 3.960 ms, min = 6.906 us, total = 165.257 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1800 total (1 active), Execution time: mean = 529.536 us, total = 953.164 ms, Queueing time: mean = 368.173 us, max = 2.197 ms, min = 6.917 us, total = 662.712 ms [state-dump] NodeManager.GcsCheckAlive - 1800 total (1 active), Execution time: mean = 295.551 us, total = 531.991 ms, Queueing time: mean = 601.826 us, max = 2.395 ms, min = 5.323 us, total = 1.083 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 900 total (1 active), Execution time: mean = 1.731 ms, total = 1.558 s, Queueing time: mean = 62.997 us, max = 1.632 ms, min = 10.033 us, total = 56.698 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 150 total (1 active, 1 running), Execution time: mean = 2.624 ms, total = 393.671 ms, Queueing time: mean = 62.553 us, max = 225.125 us, min = 13.784 us, total = 9.383 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 17 total (1 active), Execution time: mean = 529.339 s, total = 8998.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 16 total (0 active), Execution time: mean = 328.276 us, total = 5.252 ms, Queueing time: mean = 102.125 us, max = 410.175 us, min = 20.320 us, total = 1.634 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 11 total (1 active), Execution time: mean = 6.619 us, total = 72.813 us, Queueing time: mean = 49.048 us, max = 79.050 us, min = 26.627 us, total = 539.524 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:43:16,238 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:43:16,620 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 790560 total (35 active) [state-dump] Queueing time: mean = 4.565 ms, max = 590.169 s, min = -0.000 s, total = 3609.279 s [state-dump] Execution time: mean = 11.540 ms, total = 9123.214 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 190197 total (0 active), Execution time: mean = 457.947 us, total = 87.100 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 190197 total (0 active), Execution time: mean = 31.029 us, total = 5.902 s, Queueing time: mean = 94.714 us, max = 23.460 ms, min = 1.417 us, total = 18.014 s [state-dump] RaySyncer.OnDemandBroadcasting - 90523 total (1 active), Execution time: mean = 9.357 us, total = 846.991 ms, Queueing time: mean = 81.860 us, max = 65.085 ms, min = -0.000 s, total = 7.410 s [state-dump] NodeManager.CheckGC - 90523 total (1 active), Execution time: mean = 3.691 us, total = 334.153 ms, Queueing time: mean = 86.701 us, max = 60.039 ms, min = 3.126 us, total = 7.848 s [state-dump] ObjectManager.UpdateAvailableMemory - 90522 total (0 active), Execution time: mean = 5.077 us, total = 459.574 ms, Queueing time: mean = 88.345 us, max = 48.698 ms, min = 2.040 us, total = 7.997 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 45285 total (1 active), Execution time: mean = 16.068 us, total = 727.647 ms, Queueing time: mean = 66.323 us, max = 41.182 ms, min = -0.000 s, total = 3.003 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 36170 total (1 active), Execution time: mean = 434.288 us, total = 15.708 s, Queueing time: mean = 65.704 us, max = 27.346 ms, min = -0.000 s, total = 2.377 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 9060 total (1 active), Execution time: mean = 14.951 us, total = 135.459 ms, Queueing time: mean = 62.929 us, max = 4.009 ms, min = 6.093 us, total = 570.139 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 9059 total (1 active), Execution time: mean = 8.311 us, total = 75.291 ms, Queueing time: mean = 173.276 us, max = 4.336 ms, min = -0.000 s, total = 1.570 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 9059 total (1 active), Execution time: mean = 3.179 us, total = 28.797 ms, Queueing time: mean = 176.625 us, max = 4.341 ms, min = 2.496 us, total = 1.600 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 9057 total (0 active), Execution time: mean = 99.341 us, total = 899.736 ms, Queueing time: mean = 93.034 us, max = 2.573 ms, min = 2.559 us, total = 842.613 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 9057 total (0 active), Execution time: mean = 558.634 us, total = 5.060 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 3021 total (1 active), Execution time: mean = 8.053 us, total = 24.327 ms, Queueing time: mean = 68.875 us, max = 6.635 ms, min = 11.179 us, total = 208.070 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1812 total (0 active), Execution time: mean = 1.363 ms, total = 2.471 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1812 total (0 active), Execution time: mean = 50.339 us, total = 91.215 ms, Queueing time: mean = 91.868 us, max = 3.960 ms, min = 6.906 us, total = 166.464 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1812 total (1 active), Execution time: mean = 529.464 us, total = 959.389 ms, Queueing time: mean = 368.414 us, max = 2.197 ms, min = 6.917 us, total = 667.567 ms [state-dump] NodeManager.GcsCheckAlive - 1812 total (1 active), Execution time: mean = 295.397 us, total = 535.259 ms, Queueing time: mean = 602.152 us, max = 2.395 ms, min = 5.323 us, total = 1.091 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 906 total (1 active), Execution time: mean = 1.731 ms, total = 1.569 s, Queueing time: mean = 63.004 us, max = 1.632 ms, min = 10.033 us, total = 57.082 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 151 total (1 active, 1 running), Execution time: mean = 2.627 ms, total = 396.614 ms, Queueing time: mean = 62.548 us, max = 225.125 us, min = 13.784 us, total = 9.445 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 17 total (1 active), Execution time: mean = 529.339 s, total = 8998.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 16 total (0 active), Execution time: mean = 328.276 us, total = 5.252 ms, Queueing time: mean = 102.125 us, max = 410.175 us, min = 20.320 us, total = 1.634 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 11 total (1 active), Execution time: mean = 6.619 us, total = 72.813 us, Queueing time: mean = 49.048 us, max = 79.050 us, min = 26.627 us, total = 539.524 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:44:16,238 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:44:16,623 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 795794 total (35 active) [state-dump] Queueing time: mean = 4.536 ms, max = 590.169 s, min = -0.000 s, total = 3609.715 s [state-dump] Execution time: mean = 11.465 ms, total = 9124.171 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 191457 total (0 active), Execution time: mean = 458.596 us, total = 87.801 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 191457 total (0 active), Execution time: mean = 31.051 us, total = 5.945 s, Queueing time: mean = 94.935 us, max = 23.460 ms, min = 1.417 us, total = 18.176 s [state-dump] RaySyncer.OnDemandBroadcasting - 91123 total (1 active), Execution time: mean = 9.356 us, total = 852.517 ms, Queueing time: mean = 81.903 us, max = 65.085 ms, min = -0.000 s, total = 7.463 s [state-dump] NodeManager.CheckGC - 91123 total (1 active), Execution time: mean = 3.686 us, total = 335.918 ms, Queueing time: mean = 86.747 us, max = 60.039 ms, min = 3.126 us, total = 7.905 s [state-dump] ObjectManager.UpdateAvailableMemory - 91122 total (0 active), Execution time: mean = 5.083 us, total = 463.130 ms, Queueing time: mean = 88.573 us, max = 48.698 ms, min = 2.040 us, total = 8.071 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 45584 total (1 active), Execution time: mean = 16.069 us, total = 732.504 ms, Queueing time: mean = 66.383 us, max = 41.182 ms, min = -0.000 s, total = 3.026 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 36410 total (1 active), Execution time: mean = 434.339 us, total = 15.814 s, Queueing time: mean = 65.776 us, max = 27.346 ms, min = -0.000 s, total = 2.395 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 9120 total (1 active), Execution time: mean = 14.972 us, total = 136.541 ms, Queueing time: mean = 63.044 us, max = 4.009 ms, min = 6.093 us, total = 574.965 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 9119 total (1 active), Execution time: mean = 8.312 us, total = 75.794 ms, Queueing time: mean = 173.378 us, max = 4.336 ms, min = -0.000 s, total = 1.581 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 9119 total (1 active), Execution time: mean = 3.179 us, total = 28.993 ms, Queueing time: mean = 176.727 us, max = 4.341 ms, min = 2.496 us, total = 1.612 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 9117 total (0 active), Execution time: mean = 99.410 us, total = 906.318 ms, Queueing time: mean = 93.135 us, max = 2.573 ms, min = 2.559 us, total = 849.112 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 9117 total (0 active), Execution time: mean = 559.297 us, total = 5.099 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 3041 total (1 active), Execution time: mean = 8.056 us, total = 24.500 ms, Queueing time: mean = 68.864 us, max = 6.635 ms, min = 11.179 us, total = 209.415 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1824 total (0 active), Execution time: mean = 1.364 ms, total = 2.488 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1824 total (0 active), Execution time: mean = 50.354 us, total = 91.845 ms, Queueing time: mean = 92.020 us, max = 3.960 ms, min = 6.906 us, total = 167.845 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1824 total (1 active), Execution time: mean = 529.987 us, total = 966.696 ms, Queueing time: mean = 368.392 us, max = 2.197 ms, min = 6.917 us, total = 671.947 ms [state-dump] NodeManager.GcsCheckAlive - 1824 total (1 active), Execution time: mean = 295.401 us, total = 538.812 ms, Queueing time: mean = 602.661 us, max = 2.395 ms, min = 5.323 us, total = 1.099 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 912 total (1 active), Execution time: mean = 1.732 ms, total = 1.579 s, Queueing time: mean = 63.004 us, max = 1.632 ms, min = 10.033 us, total = 57.459 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 152 total (1 active, 1 running), Execution time: mean = 2.629 ms, total = 399.611 ms, Queueing time: mean = 62.697 us, max = 225.125 us, min = 13.784 us, total = 9.530 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 17 total (1 active), Execution time: mean = 529.339 s, total = 8998.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 16 total (0 active), Execution time: mean = 328.276 us, total = 5.252 ms, Queueing time: mean = 102.125 us, max = 410.175 us, min = 20.320 us, total = 1.634 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 11 total (1 active), Execution time: mean = 6.619 us, total = 72.813 us, Queueing time: mean = 49.048 us, max = 79.050 us, min = 26.627 us, total = 539.524 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:45:16,238 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:45:16,626 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 801025 total (35 active) [state-dump] Queueing time: mean = 4.507 ms, max = 590.169 s, min = -0.000 s, total = 3610.127 s [state-dump] Execution time: mean = 11.392 ms, total = 9125.119 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 192717 total (0 active), Execution time: mean = 459.210 us, total = 88.497 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 192717 total (0 active), Execution time: mean = 31.077 us, total = 5.989 s, Queueing time: mean = 95.073 us, max = 23.460 ms, min = 1.417 us, total = 18.322 s [state-dump] RaySyncer.OnDemandBroadcasting - 91722 total (1 active), Execution time: mean = 9.355 us, total = 858.014 ms, Queueing time: mean = 81.948 us, max = 65.085 ms, min = -0.000 s, total = 7.516 s [state-dump] NodeManager.CheckGC - 91722 total (1 active), Execution time: mean = 3.681 us, total = 337.668 ms, Queueing time: mean = 86.795 us, max = 60.039 ms, min = 3.126 us, total = 7.961 s [state-dump] ObjectManager.UpdateAvailableMemory - 91721 total (0 active), Execution time: mean = 5.087 us, total = 466.566 ms, Queueing time: mean = 88.727 us, max = 48.698 ms, min = 2.040 us, total = 8.138 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 45884 total (1 active), Execution time: mean = 16.068 us, total = 737.264 ms, Queueing time: mean = 66.435 us, max = 41.182 ms, min = -0.000 s, total = 3.048 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 36649 total (1 active), Execution time: mean = 434.326 us, total = 15.918 s, Queueing time: mean = 65.846 us, max = 27.346 ms, min = -0.000 s, total = 2.413 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 9180 total (1 active), Execution time: mean = 14.977 us, total = 137.487 ms, Queueing time: mean = 63.070 us, max = 4.009 ms, min = 6.093 us, total = 578.983 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 9179 total (1 active), Execution time: mean = 8.311 us, total = 76.289 ms, Queueing time: mean = 173.471 us, max = 4.336 ms, min = -0.000 s, total = 1.592 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 9179 total (1 active), Execution time: mean = 3.180 us, total = 29.188 ms, Queueing time: mean = 176.819 us, max = 4.341 ms, min = 2.496 us, total = 1.623 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 9177 total (0 active), Execution time: mean = 99.461 us, total = 912.753 ms, Queueing time: mean = 93.278 us, max = 2.573 ms, min = 2.559 us, total = 856.011 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 9177 total (0 active), Execution time: mean = 559.839 us, total = 5.138 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 3061 total (1 active), Execution time: mean = 8.060 us, total = 24.673 ms, Queueing time: mean = 68.903 us, max = 6.635 ms, min = 11.179 us, total = 210.913 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1836 total (0 active), Execution time: mean = 1.364 ms, total = 2.504 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1836 total (0 active), Execution time: mean = 50.363 us, total = 92.466 ms, Queueing time: mean = 92.143 us, max = 3.960 ms, min = 6.906 us, total = 169.175 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1836 total (1 active), Execution time: mean = 530.620 us, total = 974.217 ms, Queueing time: mean = 368.221 us, max = 2.197 ms, min = 6.917 us, total = 676.053 ms [state-dump] NodeManager.GcsCheckAlive - 1836 total (1 active), Execution time: mean = 295.575 us, total = 542.677 ms, Queueing time: mean = 602.965 us, max = 2.395 ms, min = 5.323 us, total = 1.107 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 918 total (1 active), Execution time: mean = 1.733 ms, total = 1.591 s, Queueing time: mean = 62.990 us, max = 1.632 ms, min = 10.033 us, total = 57.825 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 153 total (1 active, 1 running), Execution time: mean = 2.629 ms, total = 402.279 ms, Queueing time: mean = 62.741 us, max = 225.125 us, min = 13.784 us, total = 9.599 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 17 total (1 active), Execution time: mean = 529.339 s, total = 8998.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 16 total (0 active), Execution time: mean = 328.276 us, total = 5.252 ms, Queueing time: mean = 102.125 us, max = 410.175 us, min = 20.320 us, total = 1.634 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 11 total (1 active), Execution time: mean = 6.619 us, total = 72.813 us, Queueing time: mean = 49.048 us, max = 79.050 us, min = 26.627 us, total = 539.524 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 07:46:16,239 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:46:16,627 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 806260 total (35 active) [state-dump] Queueing time: mean = 4.478 ms, max = 590.169 s, min = -0.000 s, total = 3610.503 s [state-dump] Execution time: mean = 11.319 ms, total = 9125.955 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 193977 total (0 active), Execution time: mean = 459.266 us, total = 89.087 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 193977 total (0 active), Execution time: mean = 31.081 us, total = 6.029 s, Queueing time: mean = 95.117 us, max = 23.460 ms, min = 1.417 us, total = 18.451 s [state-dump] RaySyncer.OnDemandBroadcasting - 92322 total (1 active), Execution time: mean = 9.358 us, total = 863.987 ms, Queueing time: mean = 81.964 us, max = 65.085 ms, min = -0.000 s, total = 7.567 s [state-dump] NodeManager.CheckGC - 92322 total (1 active), Execution time: mean = 3.678 us, total = 339.554 ms, Queueing time: mean = 86.818 us, max = 60.039 ms, min = 3.126 us, total = 8.015 s [state-dump] ObjectManager.UpdateAvailableMemory - 92321 total (0 active), Execution time: mean = 5.088 us, total = 469.708 ms, Queueing time: mean = 88.759 us, max = 48.698 ms, min = 2.040 us, total = 8.194 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 46184 total (1 active), Execution time: mean = 16.076 us, total = 742.453 ms, Queueing time: mean = 66.410 us, max = 41.182 ms, min = -0.000 s, total = 3.067 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 36889 total (1 active), Execution time: mean = 434.394 us, total = 16.024 s, Queueing time: mean = 65.882 us, max = 27.346 ms, min = -0.000 s, total = 2.430 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 9240 total (1 active), Execution time: mean = 14.979 us, total = 138.401 ms, Queueing time: mean = 63.089 us, max = 4.009 ms, min = 6.093 us, total = 582.940 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 9239 total (1 active), Execution time: mean = 8.314 us, total = 76.812 ms, Queueing time: mean = 173.584 us, max = 4.336 ms, min = -0.000 s, total = 1.604 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 9239 total (1 active), Execution time: mean = 3.180 us, total = 29.376 ms, Queueing time: mean = 176.934 us, max = 4.341 ms, min = 2.496 us, total = 1.635 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 9237 total (0 active), Execution time: mean = 99.458 us, total = 918.690 ms, Queueing time: mean = 93.339 us, max = 2.573 ms, min = 2.559 us, total = 862.175 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 9237 total (0 active), Execution time: mean = 560.013 us, total = 5.173 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 3081 total (1 active), Execution time: mean = 8.073 us, total = 24.873 ms, Queueing time: mean = 69.269 us, max = 6.635 ms, min = 11.179 us, total = 213.418 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1848 total (0 active), Execution time: mean = 1.364 ms, total = 2.520 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1848 total (0 active), Execution time: mean = 50.373 us, total = 93.089 ms, Queueing time: mean = 92.284 us, max = 3.960 ms, min = 6.906 us, total = 170.541 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1848 total (1 active), Execution time: mean = 530.833 us, total = 980.979 ms, Queueing time: mean = 368.565 us, max = 2.197 ms, min = 6.917 us, total = 681.109 ms [state-dump] NodeManager.GcsCheckAlive - 1848 total (1 active), Execution time: mean = 295.529 us, total = 546.137 ms, Queueing time: mean = 603.567 us, max = 2.395 ms, min = 5.323 us, total = 1.115 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 924 total (1 active), Execution time: mean = 1.734 ms, total = 1.602 s, Queueing time: mean = 63.079 us, max = 1.632 ms, min = 10.033 us, total = 58.285 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 154 total (1 active, 1 running), Execution time: mean = 2.631 ms, total = 405.250 ms, Queueing time: mean = 62.871 us, max = 225.125 us, min = 13.784 us, total = 9.682 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 17 total (1 active), Execution time: mean = 529.339 s, total = 8998.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 16 total (0 active), Execution time: mean = 328.276 us, total = 5.252 ms, Queueing time: mean = 102.125 us, max = 410.175 us, min = 20.320 us, total = 1.634 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 11 total (1 active), Execution time: mean = 6.619 us, total = 72.813 us, Queueing time: mean = 49.048 us, max = 79.050 us, min = 26.627 us, total = 539.524 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 0 [state-dump] [state-dump] [2025-01-21 07:47:16,239 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:47:16,630 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 811491 total (35 active) [state-dump] Queueing time: mean = 4.450 ms, max = 590.169 s, min = -0.000 s, total = 3610.826 s [state-dump] Execution time: mean = 11.247 ms, total = 9126.715 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 195237 total (0 active), Execution time: mean = 458.991 us, total = 89.612 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 195237 total (0 active), Execution time: mean = 31.065 us, total = 6.065 s, Queueing time: mean = 95.041 us, max = 23.460 ms, min = 1.417 us, total = 18.556 s [state-dump] RaySyncer.OnDemandBroadcasting - 92921 total (1 active), Execution time: mean = 9.360 us, total = 869.758 ms, Queueing time: mean = 81.934 us, max = 65.085 ms, min = -0.000 s, total = 7.613 s [state-dump] NodeManager.CheckGC - 92921 total (1 active), Execution time: mean = 3.674 us, total = 341.397 ms, Queueing time: mean = 86.794 us, max = 60.039 ms, min = 3.126 us, total = 8.065 s [state-dump] ObjectManager.UpdateAvailableMemory - 92920 total (0 active), Execution time: mean = 5.088 us, total = 472.779 ms, Queueing time: mean = 88.700 us, max = 48.698 ms, min = 2.040 us, total = 8.242 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 46484 total (1 active), Execution time: mean = 16.080 us, total = 747.451 ms, Queueing time: mean = 66.365 us, max = 41.182 ms, min = -0.000 s, total = 3.085 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 37128 total (1 active), Execution time: mean = 434.455 us, total = 16.130 s, Queueing time: mean = 65.837 us, max = 27.346 ms, min = -0.000 s, total = 2.444 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 9300 total (1 active), Execution time: mean = 14.972 us, total = 139.236 ms, Queueing time: mean = 63.028 us, max = 4.009 ms, min = 6.093 us, total = 586.157 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 9299 total (1 active), Execution time: mean = 8.313 us, total = 77.304 ms, Queueing time: mean = 173.531 us, max = 4.336 ms, min = -0.000 s, total = 1.614 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 9299 total (1 active), Execution time: mean = 3.180 us, total = 29.568 ms, Queueing time: mean = 176.881 us, max = 4.341 ms, min = 2.496 us, total = 1.645 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 9297 total (0 active), Execution time: mean = 99.491 us, total = 924.966 ms, Queueing time: mean = 93.245 us, max = 2.573 ms, min = 2.559 us, total = 866.897 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 9297 total (0 active), Execution time: mean = 559.869 us, total = 5.205 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 3101 total (1 active), Execution time: mean = 8.072 us, total = 25.031 ms, Queueing time: mean = 69.219 us, max = 6.635 ms, min = 11.179 us, total = 214.650 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1860 total (0 active), Execution time: mean = 1.363 ms, total = 2.535 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1860 total (0 active), Execution time: mean = 50.351 us, total = 93.652 ms, Queueing time: mean = 92.266 us, max = 3.960 ms, min = 6.906 us, total = 171.615 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1860 total (1 active), Execution time: mean = 530.895 us, total = 987.464 ms, Queueing time: mean = 368.275 us, max = 2.197 ms, min = 6.917 us, total = 684.992 ms [state-dump] NodeManager.GcsCheckAlive - 1860 total (1 active), Execution time: mean = 295.456 us, total = 549.548 ms, Queueing time: mean = 603.400 us, max = 2.395 ms, min = 5.323 us, total = 1.122 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 930 total (1 active), Execution time: mean = 1.734 ms, total = 1.612 s, Queueing time: mean = 63.126 us, max = 1.632 ms, min = 10.033 us, total = 58.707 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 155 total (1 active, 1 running), Execution time: mean = 2.625 ms, total = 406.819 ms, Queueing time: mean = 62.796 us, max = 225.125 us, min = 13.784 us, total = 9.733 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 17 total (1 active), Execution time: mean = 529.339 s, total = 8998.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 16 total (0 active), Execution time: mean = 328.276 us, total = 5.252 ms, Queueing time: mean = 102.125 us, max = 410.175 us, min = 20.320 us, total = 1.634 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 11 total (1 active), Execution time: mean = 6.619 us, total = 72.813 us, Queueing time: mean = 49.048 us, max = 79.050 us, min = 26.627 us, total = 539.524 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 07:48:16,239 I 16746 16775] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 07:48:16,634 I 16746 16746] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -1615174027500056430 Local resources: {"total":{accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "available": {accelerator_type:A40: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000], node:192.168.0.2: [10000], node:__internal_head__: [10000], CPU: [200000], memory: [869061529600000]}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -1615174027500056430{"total":{GPU: 20000, memory: 869061529600000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, memory: 869061529600000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"3c97e402ac1f323212309d4e3e7c6ac4544bfd0bfbd0bef1969903c3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 816726 total (35 active) [state-dump] Queueing time: mean = 4.422 ms, max = 590.169 s, min = -0.000 s, total = 3611.260 s [state-dump] Execution time: mean = 11.176 ms, total = 9127.675 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 196497 total (0 active), Execution time: mean = 459.597 us, total = 90.310 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 196497 total (0 active), Execution time: mean = 31.100 us, total = 6.111 s, Queueing time: mean = 95.235 us, max = 23.460 ms, min = 1.417 us, total = 18.713 s [state-dump] RaySyncer.OnDemandBroadcasting - 93521 total (1 active), Execution time: mean = 9.363 us, total = 875.683 ms, Queueing time: mean = 81.985 us, max = 65.085 ms, min = -0.000 s, total = 7.667 s [state-dump] NodeManager.CheckGC - 93521 total (1 active), Execution time: mean = 3.671 us, total = 343.290 ms, Queueing time: mean = 86.850 us, max = 60.039 ms, min = 3.126 us, total = 8.122 s [state-dump] ObjectManager.UpdateAvailableMemory - 93520 total (0 active), Execution time: mean = 5.094 us, total = 476.432 ms, Queueing time: mean = 88.873 us, max = 48.698 ms, min = 2.040 us, total = 8.311 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 46784 total (1 active), Execution time: mean = 16.083 us, total = 752.414 ms, Queueing time: mean = 66.421 us, max = 41.182 ms, min = -0.000 s, total = 3.107 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 37368 total (1 active), Execution time: mean = 434.565 us, total = 16.239 s, Queueing time: mean = 65.881 us, max = 27.346 ms, min = -0.000 s, total = 2.462 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 9360 total (1 active), Execution time: mean = 14.976 us, total = 140.176 ms, Queueing time: mean = 63.579 us, max = 4.009 ms, min = 6.093 us, total = 595.099 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 9359 total (1 active), Execution time: mean = 8.316 us, total = 77.834 ms, Queueing time: mean = 173.690 us, max = 4.336 ms, min = -0.000 s, total = 1.626 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 9359 total (1 active), Execution time: mean = 3.180 us, total = 29.766 ms, Queueing time: mean = 177.041 us, max = 4.341 ms, min = 2.496 us, total = 1.657 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 9357 total (0 active), Execution time: mean = 99.556 us, total = 931.550 ms, Queueing time: mean = 93.394 us, max = 2.573 ms, min = 2.559 us, total = 873.886 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 9357 total (0 active), Execution time: mean = 560.455 us, total = 5.244 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 3121 total (1 active), Execution time: mean = 8.076 us, total = 25.205 ms, Queueing time: mean = 69.247 us, max = 6.635 ms, min = 11.179 us, total = 216.118 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1872 total (0 active), Execution time: mean = 1.364 ms, total = 2.553 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1872 total (0 active), Execution time: mean = 50.396 us, total = 94.342 ms, Queueing time: mean = 92.484 us, max = 3.960 ms, min = 6.906 us, total = 173.130 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1872 total (1 active), Execution time: mean = 531.529 us, total = 995.023 ms, Queueing time: mean = 368.258 us, max = 2.197 ms, min = 6.917 us, total = 689.378 ms [state-dump] NodeManager.GcsCheckAlive - 1872 total (1 active), Execution time: mean = 295.541 us, total = 553.253 ms, Queueing time: mean = 603.940 us, max = 2.395 ms, min = 5.323 us, total = 1.131 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 936 total (1 active), Execution time: mean = 1.734 ms, total = 1.623 s, Queueing time: mean = 63.250 us, max = 1.632 ms, min = 10.033 us, total = 59.202 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 156 total (1 active, 1 running), Execution time: mean = 2.627 ms, total = 409.807 ms, Queueing time: mean = 62.870 us, max = 225.125 us, min = 13.784 us, total = 9.808 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 7.202 us, total = 756.218 us, Queueing time: mean = 33.864 s, max = 590.169 s, min = 21.510 us, total = 3555.733 s [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 873.139 us, total = 73.344 ms, Queueing time: mean = 41.762 us, max = 626.320 us, min = 4.173 us, total = 3.508 ms [state-dump] - 22 total (0 active), Execution time: mean = 1.029 us, total = 22.628 us, Queueing time: mean = 106.490 us, max = 324.279 us, min = 19.153 us, total = 2.343 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.024 us, total = 22.519 us, Queueing time: mean = 38.945 us, max = 194.756 us, min = 13.446 us, total = 856.793 us [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 232.459 us, total = 5.114 ms, Queueing time: mean = 707.909 ns, max = 1.120 us, min = 208.000 ns, total = 15.574 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 7.729 us, total = 162.308 us, Queueing time: mean = 14.543 us, max = 28.825 us, min = 6.671 us, total = 305.394 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 24.427 us, total = 512.975 us, Queueing time: mean = 1.870 ms, max = 18.580 ms, min = 8.816 us, total = 39.269 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 46.407 ms, total = 974.539 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 14.369 us, total = 301.741 us, Queueing time: mean = 1.186 ms, max = 24.010 ms, min = 26.317 us, total = 24.907 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 465.503 us, total = 9.776 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 87.064 us, total = 1.828 ms, Queueing time: mean = 26.651 us, max = 115.064 us, min = 9.507 us, total = 559.672 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 17 total (1 active), Execution time: mean = 529.339 s, total = 8998.762 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 16 total (0 active), Execution time: mean = 328.276 us, total = 5.252 ms, Queueing time: mean = 102.125 us, max = 410.175 us, min = 20.320 us, total = 1.634 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 14 total (0 active), Execution time: mean = 1.105 ms, total = 15.474 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 14 total (0 active), Execution time: mean = 182.949 us, total = 2.561 ms, Queueing time: mean = 159.400 us, max = 347.804 us, min = 18.498 us, total = 2.232 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 14 total (0 active), Execution time: mean = 595.442 us, total = 8.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 14 total (0 active), Execution time: mean = 63.519 us, total = 889.268 us, Queueing time: mean = 78.448 us, max = 151.911 us, min = 16.930 us, total = 1.098 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 14 total (0 active), Execution time: mean = 126.142 us, total = 1.766 ms, Queueing time: mean = 72.891 us, max = 135.953 us, min = 20.450 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 230.070 us, total = 2.991 ms, Queueing time: mean = 2.551 ms, max = 6.946 ms, min = 27.474 us, total = 33.161 ms [state-dump] NodeManager.GCTaskFailureReason - 11 total (1 active), Execution time: mean = 6.619 us, total = 72.813 us, Queueing time: mean = 49.048 us, max = 79.050 us, min = 26.627 us, total = 539.524 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 3.078 us, total = 6.155 us, Queueing time: mean = 395.500 ns, max = 620.000 ns, min = 171.000 ns, total = 791.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.095 ms, total = 2.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 175.719 us, total = 351.437 us, Queueing time: mean = 739.221 us, max = 1.456 ms, min = 22.770 us, total = 1.478 ms [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.392 ms, total = 1.392 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.661 ms, total = 1.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 15.038 ms, total = 15.038 ms, Queueing time: mean = 21.407 us, max = 21.407 us, min = 21.407 us, total = 21.407 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 23.255 us, total = 23.255 us, Queueing time: mean = 13.831 us, max = 13.831 us, min = 13.831 us, total = 13.831 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 863.645 us, total = 863.645 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 285.909 us, total = 285.909 us, Queueing time: mean = 136.639 us, max = 136.639 us, min = 136.639 us, total = 136.639 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.209 ms, total = 1.209 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 57.882 us, total = 57.882 us, Queueing time: mean = 217.630 us, max = 217.630 us, min = 217.630 us, total = 217.630 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 39.308 us, total = 39.308 us, Queueing time: mean = 186.519 us, max = 186.519 us, min = 186.519 us, total = 186.519 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.030 ms, total = 1.030 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 249.337 us, total = 249.337 us, Queueing time: mean = 12.617 us, max = 12.617 us, min = 12.617 us, total = 12.617 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump]