[2025-01-20 21:45:50,421 I 6312 6312] (raylet) main.cc:180: Setting cluster ID to: d43798cad6aa4892a65de7d56eef016317bd29e66ad205a94eb3f8da [2025-01-20 21:45:50,430 I 6312 6312] (raylet) main.cc:289: Raylet is not set to kill unknown children. [2025-01-20 21:45:50,430 I 6312 6312] (raylet) io_service_pool.cc:35: IOServicePool is running with 1 io_service. [2025-01-20 21:45:50,431 I 6312 6312] (raylet) main.cc:419: Setting node ID node_id=ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [2025-01-20 21:45:50,431 I 6312 6312] (raylet) store_runner.cc:32: Allowing the Plasma store to use up to 2.14748GB of memory. [2025-01-20 21:45:50,431 I 6312 6312] (raylet) store_runner.cc:48: Starting object store with directory /dev/shm, fallback /tmp/ray, and huge page support disabled [2025-01-20 21:45:50,431 I 6312 6341] (raylet) dlmalloc.cc:154: create_and_mmap_buffer(2147483656, /dev/shm/plasmaXXXXXX) [2025-01-20 21:45:50,432 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 0 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 21:45:50,436 I 6312 6312] (raylet) grpc_server.cc:134: ObjectManager server started, listening on port 40407. [2025-01-20 21:45:50,440 I 6312 6312] (raylet) worker_killing_policy.cc:101: Running GroupByOwner policy. [2025-01-20 21:45:50,440 I 6312 6312] (raylet) memory_monitor.cc:47: MemoryMonitor initialized with usage threshold at 94999994368 bytes (0.95 system memory), total system memory bytes: 99999997952 [2025-01-20 21:45:50,440 I 6312 6312] (raylet) node_manager.cc:287: Initializing NodeManager node_id=ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [2025-01-20 21:45:50,441 I 6312 6312] (raylet) grpc_server.cc:134: NodeManager server started, listening on port 44399. [2025-01-20 21:45:50,450 I 6312 6380] (raylet) agent_manager.cc:77: Monitor agent process with name dashboard_agent/424238335 [2025-01-20 21:45:50,451 I 6312 6382] (raylet) agent_manager.cc:77: Monitor agent process with name runtime_env_agent [2025-01-20 21:45:50,451 I 6312 6312] (raylet) event.cc:493: Ray Event initialized for RAYLET [2025-01-20 21:45:50,451 I 6312 6312] (raylet) event.cc:324: Set ray event level to warning [2025-01-20 21:45:50,454 I 6312 6312] (raylet) raylet.cc:134: Raylet of id, ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd started. Raylet consists of node_manager and object_manager. node_manager address: 192.168.0.2:44399 object_manager address: 192.168.0.2:40407 hostname: 0cd925b1f73b [2025-01-20 21:45:50,460 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{CPU: 200000, node:__internal_head__: 10000, memory: 846480855040000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, GPU: 20000}}, "available": {CPU: 200000, node:__internal_head__: 10000, memory: 846480855040000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 35026135355246000.000 [state-dump] - num location lookups per second: 35026135355240000.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 0 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 0 [state-dump] - num PYTHON drivers: 0 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 0 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 27 total (13 active) [state-dump] Queueing time: mean = 1.957 ms, max = 12.424 ms, min = 25.912 us, total = 52.836 ms [state-dump] Execution time: mean = 1.251 ms, total = 33.775 ms [state-dump] Event stats: [state-dump] PeriodicalRunner.RunFnPeriodically - 11 total (2 active, 1 running), Execution time: mean = 435.734 us, total = 4.793 ms, Queueing time: mean = 4.794 ms, max = 12.424 ms, min = 61.413 us, total = 52.732 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] NodeManager.ScheduleAndDispatchTasks - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 1 total (0 active), Execution time: mean = 1.640 ms, total = 1.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 0 [state-dump] [state-dump] [2025-01-20 21:45:50,461 I 6312 6312] (raylet) accessor.cc:762: Received notification for node, IsAlive = 1 node_id=ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [2025-01-20 21:45:50,607 I 6312 6312] (raylet) worker_pool.cc:501: Started worker process with pid 6418, the token is 0 [2025-01-20 21:45:50,611 I 6312 6312] (raylet) worker_pool.cc:501: Started worker process with pid 6419, the token is 1 [2025-01-20 21:45:50,614 I 6312 6312] (raylet) worker_pool.cc:501: Started worker process with pid 6420, the token is 2 [2025-01-20 21:45:50,617 I 6312 6312] (raylet) worker_pool.cc:501: Started worker process with pid 6421, the token is 3 [2025-01-20 21:45:50,619 I 6312 6312] (raylet) worker_pool.cc:501: Started worker process with pid 6422, the token is 4 [2025-01-20 21:45:50,622 I 6312 6312] (raylet) worker_pool.cc:501: Started worker process with pid 6423, the token is 5 [2025-01-20 21:45:50,625 I 6312 6312] (raylet) worker_pool.cc:501: Started worker process with pid 6424, the token is 6 [2025-01-20 21:45:50,628 I 6312 6312] (raylet) worker_pool.cc:501: Started worker process with pid 6425, the token is 7 [2025-01-20 21:45:50,630 I 6312 6312] (raylet) worker_pool.cc:501: Started worker process with pid 6426, the token is 8 [2025-01-20 21:45:50,634 I 6312 6312] (raylet) worker_pool.cc:501: Started worker process with pid 6427, the token is 9 [2025-01-20 21:45:50,637 I 6312 6312] (raylet) worker_pool.cc:501: Started worker process with pid 6428, the token is 10 [2025-01-20 21:45:50,640 I 6312 6312] (raylet) worker_pool.cc:501: Started worker process with pid 6429, the token is 11 [2025-01-20 21:45:50,643 I 6312 6312] (raylet) worker_pool.cc:501: Started worker process with pid 6430, the token is 12 [2025-01-20 21:45:50,645 I 6312 6312] (raylet) worker_pool.cc:501: Started worker process with pid 6431, the token is 13 [2025-01-20 21:45:50,648 I 6312 6312] (raylet) worker_pool.cc:501: Started worker process with pid 6432, the token is 14 [2025-01-20 21:45:50,652 I 6312 6312] (raylet) worker_pool.cc:501: Started worker process with pid 6433, the token is 15 [2025-01-20 21:45:50,655 I 6312 6312] (raylet) worker_pool.cc:501: Started worker process with pid 6434, the token is 16 [2025-01-20 21:45:50,658 I 6312 6312] (raylet) worker_pool.cc:501: Started worker process with pid 6435, the token is 17 [2025-01-20 21:45:50,662 I 6312 6312] (raylet) worker_pool.cc:501: Started worker process with pid 6436, the token is 18 [2025-01-20 21:45:50,666 I 6312 6312] (raylet) worker_pool.cc:501: Started worker process with pid 6437, the token is 19 [2025-01-20 21:45:51,365 I 6312 6341] (raylet) object_store.cc:35: Object store current usage 8e-09 / 2.14748 GB. [2025-01-20 21:45:51,719 I 6312 6312] (raylet) worker_pool.cc:692: Job 01000000 already started in worker pool. [2025-01-20 21:46:00,460 W 6312 6335] (raylet) metric_exporter.cc:105: [1] Export metrics to agent failed: RpcError: RPC Error message: failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:60944: Failed to connect to remote host: Connection refused; RPC Error details: . This won't affect Ray, but you can lose metrics from the cluster. [2025-01-20 21:46:50,432 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 21:46:50,462 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{CPU: 200000, memory: 846480855040000, node:__internal_head__: 10000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, GPU: 20000, object_store_memory: 21474836480000}}, "available": {node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000, accelerator_type:A40: 10000, node:192.168.0.2: 10000, memory: 846480855040000, CPU: 200000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 5485 total (35 active) [state-dump] Queueing time: mean = 323.494 us, max = 1.050 s, min = 77.000 ns, total = 1.774 s [state-dump] Execution time: mean = 423.527 us, total = 2.323 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 1240 total (0 active), Execution time: mean = 36.454 us, total = 45.203 ms, Queueing time: mean = 103.984 us, max = 1.186 ms, min = 3.475 us, total = 128.941 ms [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 1240 total (0 active), Execution time: mean = 525.658 us, total = 651.816 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 600 total (1 active), Execution time: mean = 3.233 us, total = 1.940 ms, Queueing time: mean = 99.217 us, max = 4.613 ms, min = 6.796 us, total = 59.530 ms [state-dump] RaySyncer.OnDemandBroadcasting - 600 total (1 active), Execution time: mean = 10.772 us, total = 6.463 ms, Queueing time: mean = 92.725 us, max = 4.610 ms, min = 11.077 us, total = 55.635 ms [state-dump] ObjectManager.UpdateAvailableMemory - 599 total (0 active), Execution time: mean = 5.969 us, total = 3.575 ms, Queueing time: mean = 101.157 us, max = 410.120 us, min = 3.686 us, total = 60.593 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 300 total (1 active), Execution time: mean = 18.727 us, total = 5.618 ms, Queueing time: mean = 121.056 us, max = 13.722 ms, min = 14.543 us, total = 36.317 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 240 total (1 active), Execution time: mean = 454.803 us, total = 109.153 ms, Queueing time: mean = 73.370 us, max = 223.906 us, min = 9.711 us, total = 17.609 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 84 total (21 active), Execution time: mean = 7.284 us, total = 611.866 us, Queueing time: mean = 15.493 ms, max = 1.050 s, min = 23.644 us, total = 1.301 s [state-dump] ClientConnection.async_read.ProcessMessage - 63 total (0 active), Execution time: mean = 1.207 ms, total = 76.014 ms, Queueing time: mean = 19.352 us, max = 97.303 us, min = 3.345 us, total = 1.219 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 61 total (1 active), Execution time: mean = 14.489 us, total = 883.844 us, Queueing time: mean = 56.644 us, max = 162.840 us, min = 12.794 us, total = 3.455 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 60 total (1 active), Execution time: mean = 8.067 us, total = 484.023 us, Queueing time: mean = 140.566 us, max = 1.016 ms, min = 9.317 us, total = 8.434 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 60 total (0 active), Execution time: mean = 672.764 us, total = 40.366 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 60 total (1 active), Execution time: mean = 2.913 us, total = 174.780 us, Queueing time: mean = 144.294 us, max = 1.011 ms, min = 7.341 us, total = 8.658 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 60 total (0 active), Execution time: mean = 111.896 us, total = 6.714 ms, Queueing time: mean = 112.319 us, max = 187.906 us, min = 23.845 us, total = 6.739 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 21 total (1 active), Execution time: mean = 8.770 us, total = 184.164 us, Queueing time: mean = 58.351 us, max = 141.046 us, min = 22.842 us, total = 1.225 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManager.GcsCheckAlive - 12 total (1 active), Execution time: mean = 242.316 us, total = 2.908 ms, Queueing time: mean = 396.385 us, max = 1.174 ms, min = 13.847 us, total = 4.757 ms [state-dump] NodeManager.deadline_timer.record_metrics - 12 total (1 active), Execution time: mean = 530.375 us, total = 6.365 ms, Queueing time: mean = 163.803 us, max = 919.329 us, min = 14.245 us, total = 1.966 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 12 total (0 active), Execution time: mean = 1.448 ms, total = 17.382 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 12 total (0 active), Execution time: mean = 44.331 us, total = 531.969 us, Queueing time: mean = 97.541 us, max = 251.761 us, min = 11.913 us, total = 1.170 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 6 total (1 active), Execution time: mean = 1.426 ms, total = 8.557 ms, Queueing time: mean = 47.389 us, max = 128.460 us, min = 16.678 us, total = 284.331 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 631.912 ms, total = 1.264 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] RaySyncer.BroadcastMessage - 1 total (0 active), Execution time: mean = 70.654 us, total = 70.654 us, Queueing time: mean = 91.000 ns, max = 91.000 ns, min = 91.000 ns, total = 91.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 259.428 us, total = 259.428 us, Queueing time: mean = 137.867 us, max = 137.867 us, min = 137.867 us, total = 137.867 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] - 1 total (0 active), Execution time: mean = 591.000 ns, total = 591.000 ns, Queueing time: mean = 28.543 us, max = 28.543 us, min = 28.543 us, total = 28.543 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 1 total (1 active, 1 running), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 21:47:50,432 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 21:47:50,465 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [170000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 0 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, accelerator_type:A40: 10000, node:192.168.0.2: 10000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000, memory: 846480855040000}}, "available": {CPU: 170000, node:192.168.0.2: 10000, memory: 846480855040000, node:__internal_head__: 10000, accelerator_type:A40: 10000, GPU: 20000, object_store_memory: 21474836480000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 3 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] - (language=PYTHON actor_or_task=process_single_file pid=6420 worker_id=dbcbe322696ea74e3364d8d9163cf7c26e5bfe511748651624bcb758): {CPU: 10000} [state-dump] - (language=PYTHON actor_or_task=process_single_file pid=6432 worker_id=6507fdd579e88f027ca035b08b3773127dda78b089e54960ba403dcd): {CPU: 10000} [state-dump] - (language=PYTHON actor_or_task=process_single_file pid=6434 worker_id=482ff017d7ed003de3312f2d726dba405d1363fdbf50bfe261f2153e): {CPU: 10000} [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] - {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=__main__, class_name=, function_name=process_single_file, function_hash=a59077815fe94d3aad044273ce3c50c8} scheduling_strategy=default_scheduling_strategy { [state-dump] } [state-dump] resource_set={CPU : 1, }}: 3/20 [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 17 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 10797 total (35 active) [state-dump] Queueing time: mean = 68.714 ms, max = 74.060 s, min = 77.000 ns, total = 741.910 s [state-dump] Execution time: mean = 291.886 us, total = 3.151 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 2500 total (0 active), Execution time: mean = 35.075 us, total = 87.689 ms, Queueing time: mean = 92.675 us, max = 1.186 ms, min = 3.475 us, total = 231.688 ms [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 2500 total (0 active), Execution time: mean = 487.364 us, total = 1.218 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 1199 total (1 active), Execution time: mean = 12.100 us, total = 14.508 ms, Queueing time: mean = 82.630 us, max = 4.610 ms, min = 7.347 us, total = 99.073 ms [state-dump] NodeManager.CheckGC - 1199 total (1 active), Execution time: mean = 3.136 us, total = 3.760 ms, Queueing time: mean = 90.603 us, max = 4.613 ms, min = 3.386 us, total = 108.634 ms [state-dump] ObjectManager.UpdateAvailableMemory - 1198 total (0 active), Execution time: mean = 5.368 us, total = 6.430 ms, Queueing time: mean = 90.679 us, max = 410.120 us, min = 3.265 us, total = 108.633 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 600 total (1 active), Execution time: mean = 17.244 us, total = 10.346 ms, Queueing time: mean = 93.527 us, max = 13.722 ms, min = 12.625 us, total = 56.116 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 480 total (1 active), Execution time: mean = 447.421 us, total = 214.762 ms, Queueing time: mean = 67.488 us, max = 223.906 us, min = 9.711 us, total = 32.394 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 121 total (1 active), Execution time: mean = 13.494 us, total = 1.633 ms, Queueing time: mean = 52.302 us, max = 162.840 us, min = 12.794 us, total = 6.329 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 120 total (0 active), Execution time: mean = 111.200 us, total = 13.344 ms, Queueing time: mean = 101.537 us, max = 194.476 us, min = 15.798 us, total = 12.184 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 120 total (1 active), Execution time: mean = 7.847 us, total = 941.681 us, Queueing time: mean = 166.089 us, max = 2.380 ms, min = 9.317 us, total = 19.931 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 120 total (1 active), Execution time: mean = 2.760 us, total = 331.197 us, Queueing time: mean = 169.684 us, max = 2.379 ms, min = 7.341 us, total = 20.362 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 120 total (0 active), Execution time: mean = 631.542 us, total = 75.785 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 94 total (21 active), Execution time: mean = 7.077 us, total = 665.280 us, Queueing time: mean = 7.884 s, max = 74.060 s, min = 23.644 us, total = 741.110 s [state-dump] ClientConnection.async_read.ProcessMessage - 73 total (0 active), Execution time: mean = 1.042 ms, total = 76.084 ms, Queueing time: mean = 17.667 us, max = 97.303 us, min = 3.345 us, total = 1.290 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 41 total (1 active), Execution time: mean = 7.901 us, total = 323.954 us, Queueing time: mean = 52.499 us, max = 141.046 us, min = 14.380 us, total = 2.152 ms [state-dump] NodeManager.GcsCheckAlive - 24 total (1 active), Execution time: mean = 255.035 us, total = 6.121 ms, Queueing time: mean = 560.787 us, max = 2.263 ms, min = 13.847 us, total = 13.459 ms [state-dump] NodeManager.deadline_timer.record_metrics - 24 total (1 active), Execution time: mean = 547.726 us, total = 13.145 ms, Queueing time: mean = 291.470 us, max = 1.812 ms, min = 14.245 us, total = 6.995 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 24 total (0 active), Execution time: mean = 44.435 us, total = 1.066 ms, Queueing time: mean = 87.753 us, max = 251.761 us, min = 11.913 us, total = 2.106 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 24 total (0 active), Execution time: mean = 1.355 ms, total = 32.515 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 12 total (1 active), Execution time: mean = 1.642 ms, total = 19.707 ms, Queueing time: mean = 47.503 us, max = 128.460 us, min = 16.678 us, total = 570.039 us [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 16.060 us, total = 160.596 us, Queueing time: mean = 45.904 us, max = 110.387 us, min = 15.433 us, total = 459.045 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 602.904 us, total = 6.029 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 80.962 us, total = 809.616 us, Queueing time: mean = 39.641 us, max = 119.454 us, min = 19.147 us, total = 396.406 us [state-dump] - 9 total (0 active), Execution time: mean = 797.889 ns, total = 7.181 us, Queueing time: mean = 71.623 us, max = 186.936 us, min = 25.025 us, total = 644.609 us [state-dump] RaySyncer.BroadcastMessage - 9 total (0 active), Execution time: mean = 195.845 us, total = 1.763 ms, Queueing time: mean = 564.222 ns, max = 740.000 ns, min = 91.000 ns, total = 5.078 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 7 total (0 active), Execution time: mean = 566.719 us, total = 3.967 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 7 total (0 active), Execution time: mean = 107.449 us, total = 752.145 us, Queueing time: mean = 100.971 us, max = 134.757 us, min = 29.765 us, total = 706.800 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 2 total (1 active, 1 running), Execution time: mean = 1.199 ms, total = 2.398 ms, Queueing time: mean = 25.925 us, max = 51.850 us, min = 51.850 us, total = 51.850 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 631.912 ms, total = 1.264 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 259.428 us, total = 259.428 us, Queueing time: mean = 137.867 us, max = 137.867 us, min = 137.867 us, total = 137.867 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 21:48:50,432 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 21:48:50,467 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 846480855040000, object_store_memory: 21474836480000, CPU: 200000, accelerator_type:A40: 10000}}, "available": {GPU: 20000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 846480855040000, object_store_memory: 21474836480000, CPU: 200000, accelerator_type:A40: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 16045 total (35 active) [state-dump] Queueing time: mean = 54.065 ms, max = 125.305 s, min = 77.000 ns, total = 867.477 s [state-dump] Execution time: mean = 241.933 us, total = 3.882 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 3760 total (0 active), Execution time: mean = 32.013 us, total = 120.369 ms, Queueing time: mean = 81.713 us, max = 1.186 ms, min = 3.385 us, total = 307.240 ms [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 3760 total (0 active), Execution time: mean = 458.559 us, total = 1.724 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 1799 total (1 active), Execution time: mean = 11.405 us, total = 20.518 ms, Queueing time: mean = 77.534 us, max = 4.610 ms, min = 7.347 us, total = 139.484 ms [state-dump] NodeManager.CheckGC - 1799 total (1 active), Execution time: mean = 3.015 us, total = 5.424 ms, Queueing time: mean = 84.976 us, max = 4.613 ms, min = 3.386 us, total = 152.871 ms [state-dump] ObjectManager.UpdateAvailableMemory - 1798 total (0 active), Execution time: mean = 4.905 us, total = 8.819 ms, Queueing time: mean = 79.346 us, max = 410.120 us, min = 2.098 us, total = 142.663 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 900 total (1 active), Execution time: mean = 16.409 us, total = 14.768 ms, Queueing time: mean = 79.875 us, max = 13.722 ms, min = 10.528 us, total = 71.887 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 719 total (1 active), Execution time: mean = 438.234 us, total = 315.090 ms, Queueing time: mean = 62.964 us, max = 223.906 us, min = 9.711 us, total = 45.271 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 181 total (1 active), Execution time: mean = 12.907 us, total = 2.336 ms, Queueing time: mean = 47.246 us, max = 162.840 us, min = 10.329 us, total = 8.552 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 180 total (0 active), Execution time: mean = 104.985 us, total = 18.897 ms, Queueing time: mean = 89.234 us, max = 194.476 us, min = 13.994 us, total = 16.062 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 180 total (1 active), Execution time: mean = 7.436 us, total = 1.338 ms, Queueing time: mean = 161.948 us, max = 2.380 ms, min = 9.317 us, total = 29.151 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 180 total (1 active), Execution time: mean = 2.662 us, total = 479.115 us, Queueing time: mean = 165.297 us, max = 2.379 ms, min = 7.341 us, total = 29.754 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 180 total (0 active), Execution time: mean = 599.551 us, total = 107.919 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.227 us, total = 686.587 us, Queueing time: mean = 9.120 s, max = 125.305 s, min = 23.644 us, total = 866.415 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 1.029 ms, total = 76.113 ms, Queueing time: mean = 19.648 us, max = 164.240 us, min = 3.345 us, total = 1.454 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 61 total (1 active), Execution time: mean = 7.953 us, total = 485.137 us, Queueing time: mean = 54.118 us, max = 188.727 us, min = 14.380 us, total = 3.301 ms [state-dump] NodeManager.GcsCheckAlive - 36 total (1 active), Execution time: mean = 251.809 us, total = 9.065 ms, Queueing time: mean = 558.688 us, max = 2.263 ms, min = 13.847 us, total = 20.113 ms [state-dump] NodeManager.deadline_timer.record_metrics - 36 total (1 active), Execution time: mean = 508.325 us, total = 18.300 ms, Queueing time: mean = 317.076 us, max = 1.812 ms, min = 14.245 us, total = 11.415 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 36 total (0 active), Execution time: mean = 42.828 us, total = 1.542 ms, Queueing time: mean = 76.092 us, max = 251.761 us, min = 11.913 us, total = 2.739 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 36 total (0 active), Execution time: mean = 1.309 ms, total = 47.142 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 18 total (1 active), Execution time: mean = 1.629 ms, total = 29.331 ms, Queueing time: mean = 49.205 us, max = 128.460 us, min = 16.678 us, total = 885.691 us [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] - 12 total (0 active), Execution time: mean = 807.500 ns, total = 9.690 us, Queueing time: mean = 72.146 us, max = 186.936 us, min = 25.025 us, total = 865.752 us [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 197.386 us, total = 2.369 ms, Queueing time: mean = 586.250 ns, max = 836.000 ns, min = 91.000 ns, total = 7.035 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 542.204 us, total = 5.422 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 16.060 us, total = 160.596 us, Queueing time: mean = 45.904 us, max = 110.387 us, min = 15.433 us, total = 459.045 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 602.904 us, total = 6.029 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 80.962 us, total = 809.616 us, Queueing time: mean = 39.641 us, max = 119.454 us, min = 19.147 us, total = 396.406 us [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 106.935 us, total = 1.069 ms, Queueing time: mean = 84.444 us, max = 134.757 us, min = 29.765 us, total = 844.441 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 3 total (1 active, 1 running), Execution time: mean = 1.704 ms, total = 5.113 ms, Queueing time: mean = 33.514 us, max = 51.850 us, min = 48.693 us, total = 100.543 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 631.912 ms, total = 1.264 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 259.428 us, total = 259.428 us, Queueing time: mean = 137.867 us, max = 137.867 us, min = 137.867 us, total = 137.867 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 21:49:50,433 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 21:49:50,470 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 846480855040000, object_store_memory: 21474836480000, CPU: 200000, accelerator_type:A40: 10000}}, "available": {GPU: 20000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 846480855040000, object_store_memory: 21474836480000, CPU: 200000, accelerator_type:A40: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 21280 total (35 active) [state-dump] Queueing time: mean = 40.779 ms, max = 125.305 s, min = 77.000 ns, total = 867.787 s [state-dump] Execution time: mean = 219.804 us, total = 4.677 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 5020 total (0 active), Execution time: mean = 31.075 us, total = 155.997 ms, Queueing time: mean = 81.165 us, max = 1.186 ms, min = 3.385 us, total = 407.450 ms [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 5020 total (0 active), Execution time: mean = 455.772 us, total = 2.288 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 2399 total (1 active), Execution time: mean = 10.695 us, total = 25.658 ms, Queueing time: mean = 75.950 us, max = 4.610 ms, min = 7.347 us, total = 182.204 ms [state-dump] NodeManager.CheckGC - 2399 total (1 active), Execution time: mean = 2.952 us, total = 7.082 ms, Queueing time: mean = 82.769 us, max = 4.613 ms, min = 3.386 us, total = 198.562 ms [state-dump] ObjectManager.UpdateAvailableMemory - 2398 total (0 active), Execution time: mean = 4.789 us, total = 11.485 ms, Queueing time: mean = 79.258 us, max = 410.120 us, min = 2.098 us, total = 190.060 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1200 total (1 active), Execution time: mean = 16.050 us, total = 19.260 ms, Queueing time: mean = 73.982 us, max = 13.722 ms, min = 9.036 us, total = 88.778 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 959 total (1 active), Execution time: mean = 432.678 us, total = 414.938 ms, Queueing time: mean = 61.937 us, max = 223.906 us, min = 6.365 us, total = 59.398 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 241 total (1 active), Execution time: mean = 12.595 us, total = 3.035 ms, Queueing time: mean = 47.453 us, max = 162.840 us, min = 7.069 us, total = 11.436 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 240 total (1 active), Execution time: mean = 7.416 us, total = 1.780 ms, Queueing time: mean = 165.171 us, max = 2.380 ms, min = 9.317 us, total = 39.641 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 240 total (0 active), Execution time: mean = 102.577 us, total = 24.618 ms, Queueing time: mean = 87.642 us, max = 195.803 us, min = 13.994 us, total = 21.034 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 240 total (1 active), Execution time: mean = 2.621 us, total = 629.014 us, Queueing time: mean = 168.480 us, max = 2.379 ms, min = 7.341 us, total = 40.435 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 240 total (0 active), Execution time: mean = 592.858 us, total = 142.286 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.227 us, total = 686.587 us, Queueing time: mean = 9.120 s, max = 125.305 s, min = 23.644 us, total = 866.415 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 81 total (1 active), Execution time: mean = 7.783 us, total = 630.412 us, Queueing time: mean = 54.447 us, max = 188.727 us, min = 14.380 us, total = 4.410 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 1.029 ms, total = 76.113 ms, Queueing time: mean = 19.648 us, max = 164.240 us, min = 3.345 us, total = 1.454 ms [state-dump] NodeManager.GcsCheckAlive - 48 total (1 active), Execution time: mean = 251.100 us, total = 12.053 ms, Queueing time: mean = 579.262 us, max = 2.263 ms, min = 13.847 us, total = 27.805 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 48 total (0 active), Execution time: mean = 43.992 us, total = 2.112 ms, Queueing time: mean = 78.563 us, max = 251.761 us, min = 11.913 us, total = 3.771 ms [state-dump] NodeManager.deadline_timer.record_metrics - 48 total (1 active), Execution time: mean = 519.366 us, total = 24.930 ms, Queueing time: mean = 322.621 us, max = 1.812 ms, min = 14.245 us, total = 15.486 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 48 total (0 active), Execution time: mean = 1.354 ms, total = 64.971 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 24 total (1 active), Execution time: mean = 1.645 ms, total = 39.471 ms, Queueing time: mean = 52.858 us, max = 128.460 us, min = 16.678 us, total = 1.269 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] - 12 total (0 active), Execution time: mean = 807.500 ns, total = 9.690 us, Queueing time: mean = 72.146 us, max = 186.936 us, min = 25.025 us, total = 865.752 us [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 197.386 us, total = 2.369 ms, Queueing time: mean = 586.250 ns, max = 836.000 ns, min = 91.000 ns, total = 7.035 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 542.204 us, total = 5.422 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 16.060 us, total = 160.596 us, Queueing time: mean = 45.904 us, max = 110.387 us, min = 15.433 us, total = 459.045 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 80.962 us, total = 809.616 us, Queueing time: mean = 39.641 us, max = 119.454 us, min = 19.147 us, total = 396.406 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 602.904 us, total = 6.029 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 106.935 us, total = 1.069 ms, Queueing time: mean = 84.444 us, max = 134.757 us, min = 29.765 us, total = 844.441 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 4 total (1 active, 1 running), Execution time: mean = 1.955 ms, total = 7.822 ms, Queueing time: mean = 28.572 us, max = 51.850 us, min = 13.745 us, total = 114.288 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 631.912 ms, total = 1.264 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 259.428 us, total = 259.428 us, Queueing time: mean = 137.867 us, max = 137.867 us, min = 137.867 us, total = 137.867 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 21:50:50,433 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 21:50:50,473 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [190000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 0 Cluster resources: node id: -609853312980384924{"total":{accelerator_type:A40: 10000, node:192.168.0.2: 10000, GPU: 20000, memory: 846480855040000, node:__internal_head__: 10000, CPU: 200000, object_store_memory: 21474836480000}}, "available": {node:192.168.0.2: 10000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 190000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 1 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] - (language=PYTHON actor_or_task=process_single_file pid=6434 worker_id=482ff017d7ed003de3312f2d726dba405d1363fdbf50bfe261f2153e): {CPU: 10000} [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] - {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=__main__, class_name=, function_name=process_single_file, function_hash=a59077815fe94d3aad044273ce3c50c8} scheduling_strategy=default_scheduling_strategy { [state-dump] } [state-dump] resource_set={CPU : 1, }}: 1/20 [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 19 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 26599 total (35 active) [state-dump] Queueing time: mean = 94.900 ms, max = 165.621 s, min = 77.000 ns, total = 2524.239 s [state-dump] Execution time: mean = 204.999 us, total = 5.453 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 6280 total (0 active), Execution time: mean = 31.026 us, total = 194.841 ms, Queueing time: mean = 78.728 us, max = 3.225 ms, min = 2.778 us, total = 494.409 ms [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 6280 total (0 active), Execution time: mean = 447.334 us, total = 2.809 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 2998 total (1 active), Execution time: mean = 11.065 us, total = 33.172 ms, Queueing time: mean = 73.459 us, max = 4.610 ms, min = 7.347 us, total = 220.230 ms [state-dump] NodeManager.CheckGC - 2998 total (1 active), Execution time: mean = 2.954 us, total = 8.856 ms, Queueing time: mean = 80.673 us, max = 4.613 ms, min = 3.386 us, total = 241.858 ms [state-dump] ObjectManager.UpdateAvailableMemory - 2997 total (0 active), Execution time: mean = 4.646 us, total = 13.924 ms, Queueing time: mean = 75.141 us, max = 410.120 us, min = 2.098 us, total = 225.197 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1500 total (1 active), Execution time: mean = 15.644 us, total = 23.466 ms, Queueing time: mean = 71.008 us, max = 13.722 ms, min = 9.036 us, total = 106.511 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1198 total (1 active), Execution time: mean = 430.075 us, total = 515.230 ms, Queueing time: mean = 60.567 us, max = 223.906 us, min = 6.365 us, total = 72.559 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 301 total (1 active), Execution time: mean = 12.320 us, total = 3.708 ms, Queueing time: mean = 46.650 us, max = 162.840 us, min = 7.069 us, total = 14.042 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 300 total (1 active), Execution time: mean = 7.344 us, total = 2.203 ms, Queueing time: mean = 167.138 us, max = 2.380 ms, min = 9.317 us, total = 50.142 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 300 total (0 active), Execution time: mean = 104.865 us, total = 31.459 ms, Queueing time: mean = 86.029 us, max = 195.803 us, min = 13.994 us, total = 25.809 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 300 total (1 active), Execution time: mean = 2.594 us, total = 778.066 us, Queueing time: mean = 170.405 us, max = 2.379 ms, min = 7.341 us, total = 51.122 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 300 total (0 active), Execution time: mean = 590.572 us, total = 177.172 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 105 total (21 active), Execution time: mean = 6.919 us, total = 726.544 us, Queueing time: mean = 24.025 s, max = 165.621 s, min = 23.644 us, total = 2522.584 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 101 total (1 active), Execution time: mean = 7.403 us, total = 747.671 us, Queueing time: mean = 51.986 us, max = 188.727 us, min = 14.380 us, total = 5.251 ms [state-dump] ClientConnection.async_read.ProcessMessage - 84 total (0 active), Execution time: mean = 906.672 us, total = 76.160 ms, Queueing time: mean = 18.636 us, max = 164.240 us, min = 2.397 us, total = 1.565 ms [state-dump] NodeManager.GcsCheckAlive - 60 total (1 active), Execution time: mean = 262.827 us, total = 15.770 ms, Queueing time: mean = 584.672 us, max = 2.263 ms, min = 13.847 us, total = 35.080 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 60 total (0 active), Execution time: mean = 43.240 us, total = 2.594 ms, Queueing time: mean = 76.506 us, max = 251.761 us, min = 11.913 us, total = 4.590 ms [state-dump] NodeManager.deadline_timer.record_metrics - 60 total (1 active), Execution time: mean = 515.190 us, total = 30.911 ms, Queueing time: mean = 340.038 us, max = 1.812 ms, min = 14.245 us, total = 20.402 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 60 total (0 active), Execution time: mean = 1.339 ms, total = 80.330 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 30 total (1 active), Execution time: mean = 1.668 ms, total = 50.039 ms, Queueing time: mean = 51.374 us, max = 128.460 us, min = 16.678 us, total = 1.541 ms [state-dump] - 22 total (0 active), Execution time: mean = 774.409 ns, total = 17.037 us, Queueing time: mean = 59.139 us, max = 186.936 us, min = 20.527 us, total = 1.301 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] RaySyncer.BroadcastMessage - 22 total (0 active), Execution time: mean = 192.348 us, total = 4.232 ms, Queueing time: mean = 597.182 ns, max = 924.000 ns, min = 91.000 ns, total = 13.138 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 20 total (0 active), Execution time: mean = 34.947 us, total = 698.946 us, Queueing time: mean = 145.072 us, max = 468.077 us, min = 15.433 us, total = 2.901 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 20 total (0 active), Execution time: mean = 71.718 us, total = 1.434 ms, Queueing time: mean = 104.973 us, max = 194.984 us, min = 19.147 us, total = 2.099 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 20 total (0 active), Execution time: mean = 741.906 us, total = 14.838 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 19 total (0 active), Execution time: mean = 519.726 us, total = 9.875 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 19 total (0 active), Execution time: mean = 98.109 us, total = 1.864 ms, Queueing time: mean = 82.261 us, max = 165.892 us, min = 26.601 us, total = 1.563 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 5 total (1 active, 1 running), Execution time: mean = 2.084 ms, total = 10.421 ms, Queueing time: mean = 40.859 us, max = 90.007 us, min = 13.745 us, total = 204.295 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 631.912 ms, total = 1.264 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 259.428 us, total = 259.428 us, Queueing time: mean = 137.867 us, max = 137.867 us, min = 137.867 us, total = 137.867 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 21:51:50,433 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 21:51:50,476 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:192.168.0.2: 10000, CPU: 200000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 31837 total (35 active) [state-dump] Queueing time: mean = 84.750 ms, max = 173.627 s, min = 77.000 ns, total = 2698.200 s [state-dump] Execution time: mean = 196.415 us, total = 6.253 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 7539 total (0 active), Execution time: mean = 31.120 us, total = 234.613 ms, Queueing time: mean = 79.208 us, max = 3.225 ms, min = 2.778 us, total = 597.149 ms [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 7539 total (0 active), Execution time: mean = 446.975 us, total = 3.370 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 3598 total (1 active), Execution time: mean = 10.784 us, total = 38.801 ms, Queueing time: mean = 74.323 us, max = 4.610 ms, min = 7.347 us, total = 267.416 ms [state-dump] NodeManager.CheckGC - 3598 total (1 active), Execution time: mean = 2.950 us, total = 10.614 ms, Queueing time: mean = 81.253 us, max = 4.613 ms, min = 3.386 us, total = 292.348 ms [state-dump] ObjectManager.UpdateAvailableMemory - 3597 total (0 active), Execution time: mean = 4.686 us, total = 16.857 ms, Queueing time: mean = 77.141 us, max = 410.120 us, min = 2.098 us, total = 277.478 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1800 total (1 active), Execution time: mean = 15.680 us, total = 28.223 ms, Queueing time: mean = 70.110 us, max = 13.722 ms, min = 9.036 us, total = 126.198 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1438 total (1 active), Execution time: mean = 430.281 us, total = 618.745 ms, Queueing time: mean = 61.432 us, max = 223.906 us, min = 6.365 us, total = 88.339 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 360 total (1 active), Execution time: mean = 7.463 us, total = 2.687 ms, Queueing time: mean = 168.946 us, max = 2.380 ms, min = 9.317 us, total = 60.821 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 360 total (0 active), Execution time: mean = 103.529 us, total = 37.270 ms, Queueing time: mean = 88.010 us, max = 195.803 us, min = 13.994 us, total = 31.684 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 360 total (1 active), Execution time: mean = 12.138 us, total = 4.370 ms, Queueing time: mean = 47.876 us, max = 169.436 us, min = 7.069 us, total = 17.235 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 360 total (1 active), Execution time: mean = 2.607 us, total = 938.652 us, Queueing time: mean = 172.285 us, max = 2.379 ms, min = 7.341 us, total = 62.023 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 360 total (0 active), Execution time: mean = 585.696 us, total = 210.851 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 121 total (1 active), Execution time: mean = 7.436 us, total = 899.814 us, Queueing time: mean = 53.590 us, max = 188.727 us, min = 14.380 us, total = 6.484 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 106 total (21 active), Execution time: mean = 6.967 us, total = 738.464 us, Queueing time: mean = 25.436 s, max = 173.627 s, min = 23.644 us, total = 2696.211 s [state-dump] ClientConnection.async_read.ProcessMessage - 85 total (0 active), Execution time: mean = 896.210 us, total = 76.178 ms, Queueing time: mean = 19.628 us, max = 164.240 us, min = 2.397 us, total = 1.668 ms [state-dump] NodeManager.GcsCheckAlive - 72 total (1 active), Execution time: mean = 266.905 us, total = 19.217 ms, Queueing time: mean = 590.249 us, max = 2.263 ms, min = 13.847 us, total = 42.498 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 72 total (0 active), Execution time: mean = 43.721 us, total = 3.148 ms, Queueing time: mean = 79.924 us, max = 251.761 us, min = 11.913 us, total = 5.755 ms [state-dump] NodeManager.deadline_timer.record_metrics - 72 total (1 active), Execution time: mean = 513.146 us, total = 36.947 ms, Queueing time: mean = 351.342 us, max = 1.812 ms, min = 14.245 us, total = 25.297 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 72 total (0 active), Execution time: mean = 1.347 ms, total = 96.997 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 36 total (1 active), Execution time: mean = 1.685 ms, total = 60.667 ms, Queueing time: mean = 54.412 us, max = 128.460 us, min = 16.678 us, total = 1.959 ms [state-dump] - 23 total (0 active), Execution time: mean = 767.522 ns, total = 17.653 us, Queueing time: mean = 57.774 us, max = 186.936 us, min = 20.527 us, total = 1.329 ms [state-dump] RaySyncer.BroadcastMessage - 23 total (0 active), Execution time: mean = 191.820 us, total = 4.412 ms, Queueing time: mean = 594.696 ns, max = 924.000 ns, min = 91.000 ns, total = 13.678 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 20 total (0 active), Execution time: mean = 509.521 us, total = 10.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 20 total (0 active), Execution time: mean = 34.947 us, total = 698.946 us, Queueing time: mean = 145.072 us, max = 468.077 us, min = 15.433 us, total = 2.901 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 20 total (0 active), Execution time: mean = 71.718 us, total = 1.434 ms, Queueing time: mean = 104.973 us, max = 194.984 us, min = 19.147 us, total = 2.099 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 20 total (0 active), Execution time: mean = 741.906 us, total = 14.838 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 20 total (0 active), Execution time: mean = 97.516 us, total = 1.950 ms, Queueing time: mean = 80.335 us, max = 165.892 us, min = 26.601 us, total = 1.607 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 6 total (1 active, 1 running), Execution time: mean = 2.199 ms, total = 13.191 ms, Queueing time: mean = 36.569 us, max = 90.007 us, min = 13.745 us, total = 219.416 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 631.912 ms, total = 1.264 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 259.428 us, total = 259.428 us, Queueing time: mean = 137.867 us, max = 137.867 us, min = 137.867 us, total = 137.867 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 21:52:50,433 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 21:52:50,478 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:192.168.0.2: 10000, CPU: 200000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 37066 total (35 active) [state-dump] Queueing time: mean = 72.806 ms, max = 173.627 s, min = 77.000 ns, total = 2698.630 s [state-dump] Execution time: mean = 194.975 us, total = 7.227 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 8799 total (0 active), Execution time: mean = 32.039 us, total = 281.908 ms, Queueing time: mean = 84.775 us, max = 3.225 ms, min = 2.778 us, total = 745.938 ms [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 8799 total (0 active), Execution time: mean = 463.514 us, total = 4.078 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 4197 total (1 active), Execution time: mean = 10.717 us, total = 44.980 ms, Queueing time: mean = 77.154 us, max = 4.610 ms, min = 7.347 us, total = 323.813 ms [state-dump] NodeManager.CheckGC - 4197 total (1 active), Execution time: mean = 2.975 us, total = 12.485 ms, Queueing time: mean = 83.959 us, max = 4.613 ms, min = 3.386 us, total = 352.376 ms [state-dump] ObjectManager.UpdateAvailableMemory - 4196 total (0 active), Execution time: mean = 4.868 us, total = 20.424 ms, Queueing time: mean = 82.789 us, max = 410.120 us, min = 2.098 us, total = 347.382 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 2100 total (1 active), Execution time: mean = 15.953 us, total = 33.501 ms, Queueing time: mean = 71.000 us, max = 13.722 ms, min = 9.036 us, total = 149.099 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1677 total (1 active), Execution time: mean = 435.273 us, total = 729.954 ms, Queueing time: mean = 63.917 us, max = 299.758 us, min = 6.365 us, total = 107.188 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 420 total (1 active), Execution time: mean = 7.630 us, total = 3.205 ms, Queueing time: mean = 171.102 us, max = 2.380 ms, min = 9.317 us, total = 71.863 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 420 total (1 active), Execution time: mean = 12.454 us, total = 5.231 ms, Queueing time: mean = 57.046 us, max = 2.582 ms, min = 7.069 us, total = 23.959 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 420 total (1 active), Execution time: mean = 2.635 us, total = 1.107 ms, Queueing time: mean = 174.505 us, max = 2.379 ms, min = 7.341 us, total = 73.292 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 419 total (0 active), Execution time: mean = 103.920 us, total = 43.543 ms, Queueing time: mean = 92.009 us, max = 195.803 us, min = 13.994 us, total = 38.552 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 419 total (0 active), Execution time: mean = 598.340 us, total = 250.705 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 141 total (1 active), Execution time: mean = 7.700 us, total = 1.086 ms, Queueing time: mean = 57.192 us, max = 188.727 us, min = 14.380 us, total = 8.064 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 106 total (21 active), Execution time: mean = 6.967 us, total = 738.464 us, Queueing time: mean = 25.436 s, max = 173.627 s, min = 23.644 us, total = 2696.211 s [state-dump] ClientConnection.async_read.ProcessMessage - 85 total (0 active), Execution time: mean = 896.210 us, total = 76.178 ms, Queueing time: mean = 19.628 us, max = 164.240 us, min = 2.397 us, total = 1.668 ms [state-dump] NodeManager.GcsCheckAlive - 84 total (1 active), Execution time: mean = 267.395 us, total = 22.461 ms, Queueing time: mean = 604.421 us, max = 2.263 ms, min = 13.847 us, total = 50.771 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 84 total (0 active), Execution time: mean = 45.216 us, total = 3.798 ms, Queueing time: mean = 84.967 us, max = 251.761 us, min = 11.913 us, total = 7.137 ms [state-dump] NodeManager.deadline_timer.record_metrics - 84 total (1 active), Execution time: mean = 514.737 us, total = 43.238 ms, Queueing time: mean = 363.370 us, max = 1.812 ms, min = 14.245 us, total = 30.523 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 84 total (0 active), Execution time: mean = 1.369 ms, total = 114.962 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 42 total (1 active), Execution time: mean = 1.700 ms, total = 71.384 ms, Queueing time: mean = 58.637 us, max = 130.592 us, min = 16.678 us, total = 2.463 ms [state-dump] - 23 total (0 active), Execution time: mean = 767.522 ns, total = 17.653 us, Queueing time: mean = 57.774 us, max = 186.936 us, min = 20.527 us, total = 1.329 ms [state-dump] RaySyncer.BroadcastMessage - 23 total (0 active), Execution time: mean = 191.820 us, total = 4.412 ms, Queueing time: mean = 594.696 ns, max = 924.000 ns, min = 91.000 ns, total = 13.678 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 20 total (0 active), Execution time: mean = 509.521 us, total = 10.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 20 total (0 active), Execution time: mean = 34.947 us, total = 698.946 us, Queueing time: mean = 145.072 us, max = 468.077 us, min = 15.433 us, total = 2.901 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 20 total (0 active), Execution time: mean = 71.718 us, total = 1.434 ms, Queueing time: mean = 104.973 us, max = 194.984 us, min = 19.147 us, total = 2.099 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 20 total (0 active), Execution time: mean = 741.906 us, total = 14.838 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 20 total (0 active), Execution time: mean = 97.516 us, total = 1.950 ms, Queueing time: mean = 80.335 us, max = 165.892 us, min = 26.601 us, total = 1.607 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 7 total (1 active, 1 running), Execution time: mean = 2.291 ms, total = 16.039 ms, Queueing time: mean = 39.693 us, max = 90.007 us, min = 13.745 us, total = 277.850 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 631.912 ms, total = 1.264 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 259.428 us, total = 259.428 us, Queueing time: mean = 137.867 us, max = 137.867 us, min = 137.867 us, total = 137.867 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 21:53:50,434 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 21:53:50,481 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:192.168.0.2: 10000, CPU: 200000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 42298 total (35 active) [state-dump] Queueing time: mean = 63.810 ms, max = 173.627 s, min = -0.000 s, total = 2699.032 s [state-dump] Execution time: mean = 193.011 us, total = 8.164 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 10059 total (0 active), Execution time: mean = 32.439 us, total = 326.303 ms, Queueing time: mean = 88.153 us, max = 3.225 ms, min = 2.778 us, total = 886.729 ms [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 10059 total (0 active), Execution time: mean = 472.858 us, total = 4.756 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 4796 total (1 active), Execution time: mean = 10.619 us, total = 50.929 ms, Queueing time: mean = 78.420 us, max = 4.610 ms, min = 7.347 us, total = 376.100 ms [state-dump] NodeManager.CheckGC - 4796 total (1 active), Execution time: mean = 2.979 us, total = 14.288 ms, Queueing time: mean = 85.108 us, max = 4.613 ms, min = 3.386 us, total = 408.180 ms [state-dump] ObjectManager.UpdateAvailableMemory - 4795 total (0 active), Execution time: mean = 4.969 us, total = 23.825 ms, Queueing time: mean = 85.557 us, max = 730.033 us, min = 2.098 us, total = 410.245 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 2400 total (1 active), Execution time: mean = 16.260 us, total = 39.023 ms, Queueing time: mean = 71.634 us, max = 13.722 ms, min = 9.036 us, total = 171.923 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1917 total (1 active), Execution time: mean = 436.961 us, total = 837.655 ms, Queueing time: mean = 65.423 us, max = 299.758 us, min = -0.000 s, total = 125.416 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 480 total (1 active), Execution time: mean = 7.783 us, total = 3.736 ms, Queueing time: mean = 172.424 us, max = 2.380 ms, min = 9.317 us, total = 82.764 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 480 total (1 active), Execution time: mean = 12.788 us, total = 6.138 ms, Queueing time: mean = 57.461 us, max = 2.582 ms, min = 7.069 us, total = 27.581 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 480 total (1 active), Execution time: mean = 2.663 us, total = 1.278 ms, Queueing time: mean = 175.882 us, max = 2.379 ms, min = 7.341 us, total = 84.423 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 479 total (0 active), Execution time: mean = 103.785 us, total = 49.713 ms, Queueing time: mean = 95.860 us, max = 195.803 us, min = 13.994 us, total = 45.917 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 479 total (0 active), Execution time: mean = 605.949 us, total = 290.250 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 161 total (1 active), Execution time: mean = 7.766 us, total = 1.250 ms, Queueing time: mean = 57.944 us, max = 188.727 us, min = 14.380 us, total = 9.329 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 106 total (21 active), Execution time: mean = 6.967 us, total = 738.464 us, Queueing time: mean = 25.436 s, max = 173.627 s, min = 23.644 us, total = 2696.211 s [state-dump] NodeManager.GcsCheckAlive - 96 total (1 active), Execution time: mean = 269.551 us, total = 25.877 ms, Queueing time: mean = 611.763 us, max = 2.263 ms, min = 13.847 us, total = 58.729 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 96 total (0 active), Execution time: mean = 46.524 us, total = 4.466 ms, Queueing time: mean = 87.336 us, max = 251.761 us, min = 11.913 us, total = 8.384 ms [state-dump] NodeManager.deadline_timer.record_metrics - 96 total (1 active), Execution time: mean = 517.238 us, total = 49.655 ms, Queueing time: mean = 370.002 us, max = 1.812 ms, min = 14.245 us, total = 35.520 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 96 total (0 active), Execution time: mean = 1.389 ms, total = 133.381 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessage - 85 total (0 active), Execution time: mean = 896.210 us, total = 76.178 ms, Queueing time: mean = 19.628 us, max = 164.240 us, min = 2.397 us, total = 1.668 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 48 total (1 active), Execution time: mean = 1.720 ms, total = 82.577 ms, Queueing time: mean = 59.448 us, max = 130.592 us, min = 16.678 us, total = 2.854 ms [state-dump] - 23 total (0 active), Execution time: mean = 767.522 ns, total = 17.653 us, Queueing time: mean = 57.774 us, max = 186.936 us, min = 20.527 us, total = 1.329 ms [state-dump] RaySyncer.BroadcastMessage - 23 total (0 active), Execution time: mean = 191.820 us, total = 4.412 ms, Queueing time: mean = 594.696 ns, max = 924.000 ns, min = 91.000 ns, total = 13.678 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 20 total (0 active), Execution time: mean = 509.521 us, total = 10.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 20 total (0 active), Execution time: mean = 34.947 us, total = 698.946 us, Queueing time: mean = 145.072 us, max = 468.077 us, min = 15.433 us, total = 2.901 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 20 total (0 active), Execution time: mean = 71.718 us, total = 1.434 ms, Queueing time: mean = 104.973 us, max = 194.984 us, min = 19.147 us, total = 2.099 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 20 total (0 active), Execution time: mean = 741.906 us, total = 14.838 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 20 total (0 active), Execution time: mean = 97.516 us, total = 1.950 ms, Queueing time: mean = 80.335 us, max = 165.892 us, min = 26.601 us, total = 1.607 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 8 total (1 active, 1 running), Execution time: mean = 2.333 ms, total = 18.668 ms, Queueing time: mean = 40.396 us, max = 90.007 us, min = 13.745 us, total = 323.171 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 631.912 ms, total = 1.264 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 259.428 us, total = 259.428 us, Queueing time: mean = 137.867 us, max = 137.867 us, min = 137.867 us, total = 137.867 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 21:54:50,434 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 21:54:50,484 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:192.168.0.2: 10000, CPU: 200000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 47531 total (35 active) [state-dump] Queueing time: mean = 56.792 ms, max = 173.627 s, min = -0.000 s, total = 2699.396 s [state-dump] Execution time: mean = 189.887 us, total = 9.026 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 11319 total (0 active), Execution time: mean = 32.438 us, total = 367.165 ms, Queueing time: mean = 88.839 us, max = 3.225 ms, min = 2.778 us, total = 1.006 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 11319 total (0 active), Execution time: mean = 474.425 us, total = 5.370 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 5396 total (1 active), Execution time: mean = 10.580 us, total = 57.092 ms, Queueing time: mean = 79.346 us, max = 4.610 ms, min = 7.347 us, total = 428.149 ms [state-dump] NodeManager.CheckGC - 5396 total (1 active), Execution time: mean = 2.994 us, total = 16.156 ms, Queueing time: mean = 85.965 us, max = 4.613 ms, min = 3.386 us, total = 463.866 ms [state-dump] ObjectManager.UpdateAvailableMemory - 5395 total (0 active), Execution time: mean = 5.010 us, total = 27.030 ms, Queueing time: mean = 86.661 us, max = 730.033 us, min = 2.098 us, total = 467.535 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 2699 total (1 active), Execution time: mean = 16.462 us, total = 44.432 ms, Queueing time: mean = 71.691 us, max = 13.722 ms, min = 9.036 us, total = 193.495 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 2156 total (1 active), Execution time: mean = 438.488 us, total = 945.379 ms, Queueing time: mean = 66.018 us, max = 299.758 us, min = -0.000 s, total = 142.334 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 540 total (1 active), Execution time: mean = 7.884 us, total = 4.258 ms, Queueing time: mean = 170.915 us, max = 2.380 ms, min = 9.000 us, total = 92.294 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 540 total (1 active), Execution time: mean = 12.994 us, total = 7.017 ms, Queueing time: mean = 57.089 us, max = 2.582 ms, min = 7.069 us, total = 30.828 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 540 total (1 active), Execution time: mean = 2.686 us, total = 1.450 ms, Queueing time: mean = 174.418 us, max = 2.379 ms, min = 6.061 us, total = 94.186 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 539 total (0 active), Execution time: mean = 103.710 us, total = 55.900 ms, Queueing time: mean = 96.263 us, max = 195.803 us, min = 13.994 us, total = 51.886 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 539 total (0 active), Execution time: mean = 606.119 us, total = 326.698 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 181 total (1 active), Execution time: mean = 7.753 us, total = 1.403 ms, Queueing time: mean = 58.206 us, max = 188.727 us, min = 14.380 us, total = 10.535 ms [state-dump] NodeManager.GcsCheckAlive - 108 total (1 active), Execution time: mean = 268.510 us, total = 28.999 ms, Queueing time: mean = 605.764 us, max = 2.263 ms, min = 12.935 us, total = 65.423 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 108 total (0 active), Execution time: mean = 46.643 us, total = 5.037 ms, Queueing time: mean = 89.619 us, max = 251.761 us, min = 11.913 us, total = 9.679 ms [state-dump] NodeManager.deadline_timer.record_metrics - 108 total (1 active), Execution time: mean = 517.622 us, total = 55.903 ms, Queueing time: mean = 361.501 us, max = 1.812 ms, min = 10.977 us, total = 39.042 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 108 total (0 active), Execution time: mean = 1.385 ms, total = 149.624 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 106 total (21 active), Execution time: mean = 6.967 us, total = 738.464 us, Queueing time: mean = 25.436 s, max = 173.627 s, min = 23.644 us, total = 2696.211 s [state-dump] ClientConnection.async_read.ProcessMessage - 85 total (0 active), Execution time: mean = 896.210 us, total = 76.178 ms, Queueing time: mean = 19.628 us, max = 164.240 us, min = 2.397 us, total = 1.668 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 54 total (1 active), Execution time: mean = 1.704 ms, total = 92.033 ms, Queueing time: mean = 58.205 us, max = 130.592 us, min = 16.678 us, total = 3.143 ms [state-dump] - 23 total (0 active), Execution time: mean = 767.522 ns, total = 17.653 us, Queueing time: mean = 57.774 us, max = 186.936 us, min = 20.527 us, total = 1.329 ms [state-dump] RaySyncer.BroadcastMessage - 23 total (0 active), Execution time: mean = 191.820 us, total = 4.412 ms, Queueing time: mean = 594.696 ns, max = 924.000 ns, min = 91.000 ns, total = 13.678 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 20 total (0 active), Execution time: mean = 509.521 us, total = 10.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 20 total (0 active), Execution time: mean = 34.947 us, total = 698.946 us, Queueing time: mean = 145.072 us, max = 468.077 us, min = 15.433 us, total = 2.901 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 20 total (0 active), Execution time: mean = 71.718 us, total = 1.434 ms, Queueing time: mean = 104.973 us, max = 194.984 us, min = 19.147 us, total = 2.099 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 20 total (0 active), Execution time: mean = 741.906 us, total = 14.838 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 20 total (0 active), Execution time: mean = 97.516 us, total = 1.950 ms, Queueing time: mean = 80.335 us, max = 165.892 us, min = 26.601 us, total = 1.607 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 9 total (1 active, 1 running), Execution time: mean = 2.381 ms, total = 21.428 ms, Queueing time: mean = 42.036 us, max = 90.007 us, min = 13.745 us, total = 378.324 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 631.912 ms, total = 1.264 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 259.428 us, total = 259.428 us, Queueing time: mean = 137.867 us, max = 137.867 us, min = 137.867 us, total = 137.867 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 21:55:50,434 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 21:55:50,488 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:192.168.0.2: 10000, CPU: 200000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 52765 total (35 active) [state-dump] Queueing time: mean = 51.167 ms, max = 173.627 s, min = -0.000 s, total = 2699.830 s [state-dump] Execution time: mean = 11.511 ms, total = 607.377 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 12579 total (0 active), Execution time: mean = 33.089 us, total = 416.225 ms, Queueing time: mean = 91.355 us, max = 3.225 ms, min = 2.778 us, total = 1.149 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 12579 total (0 active), Execution time: mean = 482.573 us, total = 6.070 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 5995 total (1 active), Execution time: mean = 10.740 us, total = 64.386 ms, Queueing time: mean = 81.240 us, max = 4.610 ms, min = 7.347 us, total = 487.033 ms [state-dump] NodeManager.CheckGC - 5995 total (1 active), Execution time: mean = 3.023 us, total = 18.121 ms, Queueing time: mean = 87.970 us, max = 4.613 ms, min = 3.386 us, total = 527.381 ms [state-dump] ObjectManager.UpdateAvailableMemory - 5994 total (0 active), Execution time: mean = 5.155 us, total = 30.900 ms, Queueing time: mean = 89.877 us, max = 730.033 us, min = 2.098 us, total = 538.726 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 2999 total (1 active), Execution time: mean = 16.866 us, total = 50.581 ms, Queueing time: mean = 73.077 us, max = 13.722 ms, min = 9.036 us, total = 219.158 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 2396 total (1 active), Execution time: mean = 441.871 us, total = 1.059 s, Queueing time: mean = 68.232 us, max = 978.705 us, min = -0.000 s, total = 163.485 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 600 total (1 active), Execution time: mean = 8.039 us, total = 4.824 ms, Queueing time: mean = 172.478 us, max = 2.380 ms, min = 9.000 us, total = 103.487 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 600 total (1 active), Execution time: mean = 13.431 us, total = 8.059 ms, Queueing time: mean = 57.529 us, max = 2.582 ms, min = 7.069 us, total = 34.518 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 600 total (1 active), Execution time: mean = 2.712 us, total = 1.627 ms, Queueing time: mean = 176.053 us, max = 2.379 ms, min = 6.061 us, total = 105.632 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 599 total (0 active), Execution time: mean = 105.087 us, total = 62.947 ms, Queueing time: mean = 98.750 us, max = 213.099 us, min = 13.994 us, total = 59.151 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 599 total (0 active), Execution time: mean = 615.171 us, total = 368.487 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 201 total (1 active), Execution time: mean = 8.123 us, total = 1.633 ms, Queueing time: mean = 60.991 us, max = 188.727 us, min = 14.380 us, total = 12.259 ms [state-dump] NodeManager.GcsCheckAlive - 120 total (1 active), Execution time: mean = 270.690 us, total = 32.483 ms, Queueing time: mean = 611.739 us, max = 2.263 ms, min = 12.935 us, total = 73.409 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 120 total (0 active), Execution time: mean = 47.569 us, total = 5.708 ms, Queueing time: mean = 91.732 us, max = 251.761 us, min = 11.913 us, total = 11.008 ms [state-dump] NodeManager.deadline_timer.record_metrics - 120 total (1 active), Execution time: mean = 520.787 us, total = 62.494 ms, Queueing time: mean = 365.418 us, max = 1.812 ms, min = 10.977 us, total = 43.850 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 120 total (0 active), Execution time: mean = 1.407 ms, total = 168.812 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 106 total (21 active), Execution time: mean = 6.967 us, total = 738.464 us, Queueing time: mean = 25.436 s, max = 173.627 s, min = 23.644 us, total = 2696.211 s [state-dump] ClientConnection.async_read.ProcessMessage - 85 total (0 active), Execution time: mean = 896.210 us, total = 76.178 ms, Queueing time: mean = 19.628 us, max = 164.240 us, min = 2.397 us, total = 1.668 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 60 total (1 active), Execution time: mean = 1.715 ms, total = 102.898 ms, Queueing time: mean = 59.474 us, max = 130.592 us, min = 16.678 us, total = 3.568 ms [state-dump] - 23 total (0 active), Execution time: mean = 767.522 ns, total = 17.653 us, Queueing time: mean = 57.774 us, max = 186.936 us, min = 20.527 us, total = 1.329 ms [state-dump] RaySyncer.BroadcastMessage - 23 total (0 active), Execution time: mean = 191.820 us, total = 4.412 ms, Queueing time: mean = 594.696 ns, max = 924.000 ns, min = 91.000 ns, total = 13.678 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 20 total (0 active), Execution time: mean = 509.521 us, total = 10.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 20 total (0 active), Execution time: mean = 34.947 us, total = 698.946 us, Queueing time: mean = 145.072 us, max = 468.077 us, min = 15.433 us, total = 2.901 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 20 total (0 active), Execution time: mean = 71.718 us, total = 1.434 ms, Queueing time: mean = 104.973 us, max = 194.984 us, min = 19.147 us, total = 2.099 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 20 total (0 active), Execution time: mean = 741.906 us, total = 14.838 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 20 total (0 active), Execution time: mean = 97.516 us, total = 1.950 ms, Queueing time: mean = 80.335 us, max = 165.892 us, min = 26.601 us, total = 1.607 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 10 total (1 active, 1 running), Execution time: mean = 2.448 ms, total = 24.481 ms, Queueing time: mean = 46.746 us, max = 90.007 us, min = 13.745 us, total = 467.458 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.546 s, total = 598.639 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 390.843 us, total = 781.686 us, Queueing time: mean = 104.124 us, max = 137.867 us, min = 70.382 us, total = 208.249 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 21:56:50,434 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 21:56:50,490 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:192.168.0.2: 10000, CPU: 200000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 57999 total (35 active) [state-dump] Queueing time: mean = 46.556 ms, max = 173.627 s, min = -0.000 s, total = 2700.207 s [state-dump] Execution time: mean = 10.488 ms, total = 608.299 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 13839 total (0 active), Execution time: mean = 33.278 us, total = 460.539 ms, Queueing time: mean = 91.990 us, max = 3.225 ms, min = 2.778 us, total = 1.273 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 13839 total (0 active), Execution time: mean = 486.341 us, total = 6.730 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 6595 total (1 active), Execution time: mean = 10.756 us, total = 70.933 ms, Queueing time: mean = 81.734 us, max = 4.610 ms, min = 7.347 us, total = 539.034 ms [state-dump] NodeManager.CheckGC - 6595 total (1 active), Execution time: mean = 3.042 us, total = 20.059 ms, Queueing time: mean = 88.460 us, max = 4.613 ms, min = 3.386 us, total = 583.396 ms [state-dump] ObjectManager.UpdateAvailableMemory - 6594 total (0 active), Execution time: mean = 5.230 us, total = 34.486 ms, Queueing time: mean = 90.863 us, max = 730.033 us, min = 2.098 us, total = 599.151 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 3299 total (1 active), Execution time: mean = 17.169 us, total = 56.641 ms, Queueing time: mean = 72.909 us, max = 13.722 ms, min = 9.036 us, total = 240.527 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 2635 total (1 active), Execution time: mean = 443.038 us, total = 1.167 s, Queueing time: mean = 68.453 us, max = 978.705 us, min = -0.000 s, total = 180.374 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 660 total (1 active), Execution time: mean = 8.115 us, total = 5.356 ms, Queueing time: mean = 172.566 us, max = 2.380 ms, min = 9.000 us, total = 113.894 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 660 total (1 active), Execution time: mean = 13.731 us, total = 9.063 ms, Queueing time: mean = 57.827 us, max = 2.582 ms, min = 7.069 us, total = 38.166 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 660 total (1 active), Execution time: mean = 2.724 us, total = 1.798 ms, Queueing time: mean = 176.164 us, max = 2.379 ms, min = 6.061 us, total = 116.268 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 659 total (0 active), Execution time: mean = 105.454 us, total = 69.494 ms, Queueing time: mean = 99.388 us, max = 213.099 us, min = 13.994 us, total = 65.497 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 659 total (0 active), Execution time: mean = 619.729 us, total = 408.401 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 221 total (1 active), Execution time: mean = 8.267 us, total = 1.827 ms, Queueing time: mean = 61.432 us, max = 188.727 us, min = 14.380 us, total = 13.576 ms [state-dump] NodeManager.GcsCheckAlive - 132 total (1 active), Execution time: mean = 271.050 us, total = 35.779 ms, Queueing time: mean = 614.210 us, max = 2.263 ms, min = 12.935 us, total = 81.076 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 132 total (0 active), Execution time: mean = 47.712 us, total = 6.298 ms, Queueing time: mean = 94.251 us, max = 307.469 us, min = 11.913 us, total = 12.441 ms [state-dump] NodeManager.deadline_timer.record_metrics - 132 total (1 active), Execution time: mean = 522.854 us, total = 69.017 ms, Queueing time: mean = 366.551 us, max = 1.812 ms, min = 10.977 us, total = 48.385 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 132 total (0 active), Execution time: mean = 1.410 ms, total = 186.155 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 106 total (21 active), Execution time: mean = 6.967 us, total = 738.464 us, Queueing time: mean = 25.436 s, max = 173.627 s, min = 23.644 us, total = 2696.211 s [state-dump] ClientConnection.async_read.ProcessMessage - 85 total (0 active), Execution time: mean = 896.210 us, total = 76.178 ms, Queueing time: mean = 19.628 us, max = 164.240 us, min = 2.397 us, total = 1.668 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 66 total (1 active), Execution time: mean = 1.723 ms, total = 113.698 ms, Queueing time: mean = 59.604 us, max = 130.592 us, min = 16.678 us, total = 3.934 ms [state-dump] - 23 total (0 active), Execution time: mean = 767.522 ns, total = 17.653 us, Queueing time: mean = 57.774 us, max = 186.936 us, min = 20.527 us, total = 1.329 ms [state-dump] RaySyncer.BroadcastMessage - 23 total (0 active), Execution time: mean = 191.820 us, total = 4.412 ms, Queueing time: mean = 594.696 ns, max = 924.000 ns, min = 91.000 ns, total = 13.678 us [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 20 total (0 active), Execution time: mean = 509.521 us, total = 10.190 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 20 total (0 active), Execution time: mean = 34.947 us, total = 698.946 us, Queueing time: mean = 145.072 us, max = 468.077 us, min = 15.433 us, total = 2.901 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 20 total (0 active), Execution time: mean = 71.718 us, total = 1.434 ms, Queueing time: mean = 104.973 us, max = 194.984 us, min = 19.147 us, total = 2.099 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 20 total (0 active), Execution time: mean = 741.906 us, total = 14.838 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 20 total (0 active), Execution time: mean = 97.516 us, total = 1.950 ms, Queueing time: mean = 80.335 us, max = 165.892 us, min = 26.601 us, total = 1.607 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 11 total (1 active, 1 running), Execution time: mean = 2.500 ms, total = 27.503 ms, Queueing time: mean = 56.211 us, max = 150.858 us, min = 13.745 us, total = 618.316 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.546 s, total = 598.639 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 390.843 us, total = 781.686 us, Queueing time: mean = 104.124 us, max = 137.867 us, min = 70.382 us, total = 208.249 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 21:57:50,435 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 21:57:50,494 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [130000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 0 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 846480855040000, object_store_memory: 21474836480000, CPU: 200000, accelerator_type:A40: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 130000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 7 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] - (language=PYTHON actor_or_task=process_single_file pid=6419 worker_id=eca066f1b2255f726090a97adc6cf31ee2886a725015f18fc1d73b10): {CPU: 10000} [state-dump] - (language=PYTHON actor_or_task=process_single_file pid=6429 worker_id=3e6f01e25b8adce0e5033eab3b878028fc921ab2657d2ca763d301f2): {CPU: 10000} [state-dump] - (language=PYTHON actor_or_task=process_single_file pid=6418 worker_id=2ff4e0ae1ca71896347edfd5fa3e747cdb476ec67107334357b1c390): {CPU: 10000} [state-dump] - (language=PYTHON actor_or_task=process_single_file pid=6435 worker_id=f66bcea1905ee6bb85d11a7da6f13c9caebc591c2c554782d01f810a): {CPU: 10000} [state-dump] - (language=PYTHON actor_or_task=process_single_file pid=6431 worker_id=275c86f15e2e4a747109ba390b15f4ec5a2fcbbb77e32de4532ac75a): {CPU: 10000} [state-dump] - (language=PYTHON actor_or_task=process_single_file pid=6432 worker_id=6507fdd579e88f027ca035b08b3773127dda78b089e54960ba403dcd): {CPU: 10000} [state-dump] - (language=PYTHON actor_or_task=process_single_file pid=6434 worker_id=482ff017d7ed003de3312f2d726dba405d1363fdbf50bfe261f2153e): {CPU: 10000} [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] - {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=__main__, class_name=, function_name=process_single_file, function_hash=a59077815fe94d3aad044273ce3c50c8} scheduling_strategy=default_scheduling_strategy { [state-dump] } [state-dump] resource_set={CPU : 1, }}: 7/20 [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 13 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 63295 total (35 active) [state-dump] Queueing time: mean = 117.064 ms, max = 470.895 s, min = -0.000 s, total = 7409.548 s [state-dump] Execution time: mean = 9.625 ms, total = 609.237 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 15099 total (0 active), Execution time: mean = 33.443 us, total = 504.954 ms, Queueing time: mean = 92.651 us, max = 3.225 ms, min = 2.778 us, total = 1.399 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 15099 total (0 active), Execution time: mean = 489.197 us, total = 7.386 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 7194 total (1 active), Execution time: mean = 10.960 us, total = 78.846 ms, Queueing time: mean = 82.107 us, max = 4.610 ms, min = 7.347 us, total = 590.677 ms [state-dump] NodeManager.CheckGC - 7194 total (1 active), Execution time: mean = 3.052 us, total = 21.958 ms, Queueing time: mean = 89.018 us, max = 4.613 ms, min = 3.386 us, total = 640.399 ms [state-dump] ObjectManager.UpdateAvailableMemory - 7193 total (0 active), Execution time: mean = 5.274 us, total = 37.938 ms, Queueing time: mean = 91.868 us, max = 730.033 us, min = 2.098 us, total = 660.803 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 3599 total (1 active), Execution time: mean = 17.404 us, total = 62.639 ms, Queueing time: mean = 73.091 us, max = 13.722 ms, min = 9.036 us, total = 263.054 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 2875 total (1 active), Execution time: mean = 443.961 us, total = 1.276 s, Queueing time: mean = 68.791 us, max = 978.705 us, min = -0.000 s, total = 197.773 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 720 total (1 active), Execution time: mean = 8.163 us, total = 5.877 ms, Queueing time: mean = 173.644 us, max = 2.380 ms, min = 9.000 us, total = 125.024 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 720 total (1 active), Execution time: mean = 13.944 us, total = 10.039 ms, Queueing time: mean = 57.924 us, max = 2.582 ms, min = 7.069 us, total = 41.705 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 720 total (1 active), Execution time: mean = 2.737 us, total = 1.971 ms, Queueing time: mean = 177.263 us, max = 2.379 ms, min = 6.061 us, total = 127.629 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 719 total (0 active), Execution time: mean = 105.474 us, total = 75.835 ms, Queueing time: mean = 101.069 us, max = 1.188 ms, min = 13.994 us, total = 72.669 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 719 total (0 active), Execution time: mean = 623.017 us, total = 447.949 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 241 total (1 active), Execution time: mean = 8.422 us, total = 2.030 ms, Queueing time: mean = 61.922 us, max = 188.727 us, min = 9.881 us, total = 14.923 ms [state-dump] NodeManager.GcsCheckAlive - 144 total (1 active), Execution time: mean = 272.405 us, total = 39.226 ms, Queueing time: mean = 618.641 us, max = 2.263 ms, min = 12.935 us, total = 89.084 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 144 total (0 active), Execution time: mean = 48.144 us, total = 6.933 ms, Queueing time: mean = 94.565 us, max = 307.469 us, min = 11.913 us, total = 13.617 ms [state-dump] NodeManager.deadline_timer.record_metrics - 144 total (1 active), Execution time: mean = 530.223 us, total = 76.352 ms, Queueing time: mean = 364.052 us, max = 1.812 ms, min = 10.977 us, total = 52.423 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 144 total (0 active), Execution time: mean = 1.414 ms, total = 203.586 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 116 total (21 active), Execution time: mean = 6.998 us, total = 811.734 us, Queueing time: mean = 63.838 s, max = 470.895 s, min = 23.644 us, total = 7405.158 s [state-dump] ClientConnection.async_read.ProcessMessage - 95 total (0 active), Execution time: mean = 802.849 us, total = 76.271 ms, Queueing time: mean = 19.547 us, max = 164.240 us, min = 2.397 us, total = 1.857 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 72 total (1 active), Execution time: mean = 1.733 ms, total = 124.762 ms, Queueing time: mean = 58.693 us, max = 130.592 us, min = 16.678 us, total = 4.226 ms [state-dump] WorkerPool.PopWorkerCallback - 30 total (0 active), Execution time: mean = 39.558 us, total = 1.187 ms, Queueing time: mean = 224.723 us, max = 539.776 us, min = 15.433 us, total = 6.742 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 30 total (0 active), Execution time: mean = 88.582 us, total = 2.657 ms, Queueing time: mean = 226.281 us, max = 674.029 us, min = 19.147 us, total = 6.788 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 30 total (0 active), Execution time: mean = 959.639 us, total = 28.789 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] - 27 total (0 active), Execution time: mean = 818.852 ns, total = 22.109 us, Queueing time: mean = 66.232 us, max = 186.936 us, min = 20.527 us, total = 1.788 ms [state-dump] RaySyncer.BroadcastMessage - 27 total (0 active), Execution time: mean = 201.249 us, total = 5.434 ms, Queueing time: mean = 609.815 ns, max = 924.000 ns, min = 91.000 ns, total = 16.465 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 23 total (0 active), Execution time: mean = 540.501 us, total = 12.432 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 23 total (0 active), Execution time: mean = 102.840 us, total = 2.365 ms, Queueing time: mean = 85.921 us, max = 165.892 us, min = 26.601 us, total = 1.976 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 12 total (1 active, 1 running), Execution time: mean = 2.514 ms, total = 30.172 ms, Queueing time: mean = 56.522 us, max = 150.858 us, min = 13.745 us, total = 678.269 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.546 s, total = 598.639 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 390.843 us, total = 781.686 us, Queueing time: mean = 104.124 us, max = 137.867 us, min = 70.382 us, total = 208.249 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 21:58:50,435 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 21:58:50,495 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 68559 total (35 active) [state-dump] Queueing time: mean = 114.720 ms, max = 470.895 s, min = -0.000 s, total = 7865.066 s [state-dump] Execution time: mean = 8.901 ms, total = 610.222 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 16359 total (0 active), Execution time: mean = 34.087 us, total = 557.636 ms, Queueing time: mean = 94.389 us, max = 3.225 ms, min = 2.778 us, total = 1.544 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 16359 total (0 active), Execution time: mean = 493.856 us, total = 8.079 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 7794 total (1 active), Execution time: mean = 11.262 us, total = 87.776 ms, Queueing time: mean = 83.596 us, max = 4.610 ms, min = 7.347 us, total = 651.545 ms [state-dump] NodeManager.CheckGC - 7794 total (1 active), Execution time: mean = 3.066 us, total = 23.900 ms, Queueing time: mean = 90.785 us, max = 4.613 ms, min = 3.386 us, total = 707.582 ms [state-dump] ObjectManager.UpdateAvailableMemory - 7793 total (0 active), Execution time: mean = 5.342 us, total = 41.629 ms, Queueing time: mean = 93.387 us, max = 730.033 us, min = 2.098 us, total = 727.768 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 3899 total (1 active), Execution time: mean = 17.509 us, total = 68.266 ms, Queueing time: mean = 73.273 us, max = 13.722 ms, min = 9.036 us, total = 285.690 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 3114 total (1 active), Execution time: mean = 445.991 us, total = 1.389 s, Queueing time: mean = 69.838 us, max = 978.705 us, min = -0.000 s, total = 217.474 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 780 total (1 active), Execution time: mean = 8.278 us, total = 6.457 ms, Queueing time: mean = 175.694 us, max = 2.380 ms, min = 9.000 us, total = 137.041 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 780 total (1 active), Execution time: mean = 14.223 us, total = 11.094 ms, Queueing time: mean = 58.544 us, max = 2.582 ms, min = 7.069 us, total = 45.664 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 780 total (1 active), Execution time: mean = 2.762 us, total = 2.154 ms, Queueing time: mean = 179.374 us, max = 2.379 ms, min = 6.061 us, total = 139.911 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 779 total (0 active), Execution time: mean = 107.009 us, total = 83.360 ms, Queueing time: mean = 102.346 us, max = 1.188 ms, min = 13.994 us, total = 79.727 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 779 total (0 active), Execution time: mean = 627.998 us, total = 489.211 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 261 total (1 active), Execution time: mean = 8.539 us, total = 2.229 ms, Queueing time: mean = 62.503 us, max = 188.727 us, min = 9.881 us, total = 16.313 ms [state-dump] NodeManager.GcsCheckAlive - 156 total (1 active), Execution time: mean = 281.393 us, total = 43.897 ms, Queueing time: mean = 620.604 us, max = 2.263 ms, min = 12.935 us, total = 96.814 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 156 total (0 active), Execution time: mean = 49.130 us, total = 7.664 ms, Queueing time: mean = 96.053 us, max = 307.469 us, min = 11.913 us, total = 14.984 ms [state-dump] NodeManager.deadline_timer.record_metrics - 156 total (1 active), Execution time: mean = 536.194 us, total = 83.646 ms, Queueing time: mean = 369.003 us, max = 1.812 ms, min = 10.977 us, total = 57.564 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 156 total (0 active), Execution time: mean = 1.443 ms, total = 225.173 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 117 total (21 active), Execution time: mean = 7.058 us, total = 825.818 us, Queueing time: mean = 67.182 s, max = 470.895 s, min = 23.644 us, total = 7860.242 s [state-dump] ClientConnection.async_read.ProcessMessage - 96 total (0 active), Execution time: mean = 794.794 us, total = 76.300 ms, Queueing time: mean = 19.696 us, max = 164.240 us, min = 2.397 us, total = 1.891 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 78 total (1 active), Execution time: mean = 1.750 ms, total = 136.525 ms, Queueing time: mean = 60.555 us, max = 130.592 us, min = 16.678 us, total = 4.723 ms [state-dump] - 34 total (0 active), Execution time: mean = 889.353 ns, total = 30.238 us, Queueing time: mean = 73.833 us, max = 186.936 us, min = 20.527 us, total = 2.510 ms [state-dump] RaySyncer.BroadcastMessage - 34 total (0 active), Execution time: mean = 202.349 us, total = 6.880 ms, Queueing time: mean = 632.971 ns, max = 924.000 ns, min = 91.000 ns, total = 21.521 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 30 total (0 active), Execution time: mean = 556.684 us, total = 16.701 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 30 total (0 active), Execution time: mean = 39.558 us, total = 1.187 ms, Queueing time: mean = 224.723 us, max = 539.776 us, min = 15.433 us, total = 6.742 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 30 total (0 active), Execution time: mean = 88.582 us, total = 2.657 ms, Queueing time: mean = 226.281 us, max = 674.029 us, min = 19.147 us, total = 6.788 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 30 total (0 active), Execution time: mean = 959.639 us, total = 28.789 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 30 total (0 active), Execution time: mean = 105.102 us, total = 3.153 ms, Queueing time: mean = 87.315 us, max = 165.892 us, min = 19.400 us, total = 2.619 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 13 total (1 active, 1 running), Execution time: mean = 2.591 ms, total = 33.678 ms, Queueing time: mean = 58.686 us, max = 150.858 us, min = 13.745 us, total = 762.922 us [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.546 s, total = 598.639 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 390.843 us, total = 781.686 us, Queueing time: mean = 104.124 us, max = 137.867 us, min = 70.382 us, total = 208.249 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 21:59:50,435 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 21:59:50,498 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 73791 total (35 active) [state-dump] Queueing time: mean = 106.591 ms, max = 470.895 s, min = -0.000 s, total = 7865.471 s [state-dump] Execution time: mean = 8.282 ms, total = 611.172 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 17619 total (0 active), Execution time: mean = 34.284 us, total = 604.057 ms, Queueing time: mean = 95.719 us, max = 3.225 ms, min = 2.778 us, total = 1.686 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 17619 total (0 active), Execution time: mean = 497.376 us, total = 8.763 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 8393 total (1 active), Execution time: mean = 11.223 us, total = 94.193 ms, Queueing time: mean = 84.287 us, max = 4.610 ms, min = 7.347 us, total = 707.420 ms [state-dump] NodeManager.CheckGC - 8393 total (1 active), Execution time: mean = 3.070 us, total = 25.766 ms, Queueing time: mean = 91.431 us, max = 4.613 ms, min = 3.386 us, total = 767.381 ms [state-dump] ObjectManager.UpdateAvailableMemory - 8392 total (0 active), Execution time: mean = 5.386 us, total = 45.198 ms, Queueing time: mean = 94.159 us, max = 730.033 us, min = 2.098 us, total = 790.185 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 4199 total (1 active), Execution time: mean = 17.675 us, total = 74.217 ms, Queueing time: mean = 73.601 us, max = 13.722 ms, min = 9.036 us, total = 309.051 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 3354 total (1 active), Execution time: mean = 446.728 us, total = 1.498 s, Queueing time: mean = 70.226 us, max = 978.705 us, min = -0.000 s, total = 235.538 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 840 total (1 active), Execution time: mean = 8.341 us, total = 7.006 ms, Queueing time: mean = 174.598 us, max = 2.380 ms, min = 9.000 us, total = 146.663 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 840 total (1 active), Execution time: mean = 14.463 us, total = 12.149 ms, Queueing time: mean = 58.733 us, max = 2.582 ms, min = 7.069 us, total = 49.336 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 840 total (1 active), Execution time: mean = 2.775 us, total = 2.331 ms, Queueing time: mean = 178.310 us, max = 2.379 ms, min = 6.061 us, total = 149.780 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 839 total (0 active), Execution time: mean = 106.886 us, total = 89.677 ms, Queueing time: mean = 102.818 us, max = 1.188 ms, min = 13.994 us, total = 86.264 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 839 total (0 active), Execution time: mean = 629.917 us, total = 528.501 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 281 total (1 active), Execution time: mean = 8.603 us, total = 2.417 ms, Queueing time: mean = 63.684 us, max = 188.727 us, min = 9.881 us, total = 17.895 ms [state-dump] NodeManager.GcsCheckAlive - 168 total (1 active), Execution time: mean = 289.575 us, total = 48.649 ms, Queueing time: mean = 608.652 us, max = 2.263 ms, min = 12.935 us, total = 102.253 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 168 total (0 active), Execution time: mean = 49.685 us, total = 8.347 ms, Queueing time: mean = 96.927 us, max = 307.469 us, min = 11.913 us, total = 16.284 ms [state-dump] NodeManager.deadline_timer.record_metrics - 168 total (1 active), Execution time: mean = 535.104 us, total = 89.897 ms, Queueing time: mean = 365.927 us, max = 1.812 ms, min = 10.977 us, total = 61.476 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 168 total (0 active), Execution time: mean = 1.468 ms, total = 246.706 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 117 total (21 active), Execution time: mean = 7.058 us, total = 825.818 us, Queueing time: mean = 67.182 s, max = 470.895 s, min = 23.644 us, total = 7860.242 s [state-dump] ClientConnection.async_read.ProcessMessage - 96 total (0 active), Execution time: mean = 794.794 us, total = 76.300 ms, Queueing time: mean = 19.696 us, max = 164.240 us, min = 2.397 us, total = 1.891 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 84 total (1 active), Execution time: mean = 1.747 ms, total = 146.765 ms, Queueing time: mean = 60.682 us, max = 130.592 us, min = 16.678 us, total = 5.097 ms [state-dump] - 34 total (0 active), Execution time: mean = 889.353 ns, total = 30.238 us, Queueing time: mean = 73.833 us, max = 186.936 us, min = 20.527 us, total = 2.510 ms [state-dump] RaySyncer.BroadcastMessage - 34 total (0 active), Execution time: mean = 202.349 us, total = 6.880 ms, Queueing time: mean = 632.971 ns, max = 924.000 ns, min = 91.000 ns, total = 21.521 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 30 total (0 active), Execution time: mean = 556.684 us, total = 16.701 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 30 total (0 active), Execution time: mean = 39.558 us, total = 1.187 ms, Queueing time: mean = 224.723 us, max = 539.776 us, min = 15.433 us, total = 6.742 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 30 total (0 active), Execution time: mean = 88.582 us, total = 2.657 ms, Queueing time: mean = 226.281 us, max = 674.029 us, min = 19.147 us, total = 6.788 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 30 total (0 active), Execution time: mean = 959.639 us, total = 28.789 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 30 total (0 active), Execution time: mean = 105.102 us, total = 3.153 ms, Queueing time: mean = 87.315 us, max = 165.892 us, min = 19.400 us, total = 2.619 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 14 total (1 active, 1 running), Execution time: mean = 2.508 ms, total = 35.113 ms, Queueing time: mean = 56.557 us, max = 150.858 us, min = 13.745 us, total = 791.805 us [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.546 s, total = 598.639 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 390.843 us, total = 781.686 us, Queueing time: mean = 104.124 us, max = 137.867 us, min = 70.382 us, total = 208.249 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 22:00:50,436 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:00:50,502 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 79023 total (35 active) [state-dump] Queueing time: mean = 99.539 ms, max = 470.895 s, min = -0.000 s, total = 7865.896 s [state-dump] Execution time: mean = 7.746 ms, total = 612.147 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 18879 total (0 active), Execution time: mean = 34.625 us, total = 653.692 ms, Queueing time: mean = 96.949 us, max = 3.225 ms, min = 2.778 us, total = 1.830 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 18879 total (0 active), Execution time: mean = 501.199 us, total = 9.462 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 8992 total (1 active), Execution time: mean = 11.218 us, total = 100.870 ms, Queueing time: mean = 84.935 us, max = 4.610 ms, min = 7.347 us, total = 763.739 ms [state-dump] NodeManager.CheckGC - 8992 total (1 active), Execution time: mean = 3.075 us, total = 27.648 ms, Queueing time: mean = 92.062 us, max = 4.613 ms, min = 3.386 us, total = 827.818 ms [state-dump] ObjectManager.UpdateAvailableMemory - 8991 total (0 active), Execution time: mean = 5.440 us, total = 48.912 ms, Queueing time: mean = 95.450 us, max = 730.033 us, min = 2.098 us, total = 858.190 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 4499 total (1 active), Execution time: mean = 17.865 us, total = 80.374 ms, Queueing time: mean = 74.064 us, max = 13.722 ms, min = 9.036 us, total = 333.216 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 3593 total (1 active), Execution time: mean = 447.745 us, total = 1.609 s, Queueing time: mean = 70.965 us, max = 978.705 us, min = -0.000 s, total = 254.978 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 900 total (1 active), Execution time: mean = 8.448 us, total = 7.603 ms, Queueing time: mean = 176.461 us, max = 2.380 ms, min = 9.000 us, total = 158.815 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 900 total (1 active), Execution time: mean = 14.611 us, total = 13.150 ms, Queueing time: mean = 59.281 us, max = 2.582 ms, min = 7.069 us, total = 53.353 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 900 total (1 active), Execution time: mean = 2.786 us, total = 2.508 ms, Queueing time: mean = 180.231 us, max = 2.379 ms, min = 6.061 us, total = 162.208 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 899 total (0 active), Execution time: mean = 107.188 us, total = 96.362 ms, Queueing time: mean = 103.764 us, max = 1.188 ms, min = 13.994 us, total = 93.283 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 899 total (0 active), Execution time: mean = 633.719 us, total = 569.714 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 301 total (1 active), Execution time: mean = 8.673 us, total = 2.611 ms, Queueing time: mean = 64.595 us, max = 188.727 us, min = 9.881 us, total = 19.443 ms [state-dump] NodeManager.GcsCheckAlive - 180 total (1 active), Execution time: mean = 291.555 us, total = 52.480 ms, Queueing time: mean = 616.885 us, max = 2.263 ms, min = 12.935 us, total = 111.039 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 180 total (0 active), Execution time: mean = 50.318 us, total = 9.057 ms, Queueing time: mean = 98.189 us, max = 307.469 us, min = 11.913 us, total = 17.674 ms [state-dump] NodeManager.deadline_timer.record_metrics - 180 total (1 active), Execution time: mean = 541.527 us, total = 97.475 ms, Queueing time: mean = 369.157 us, max = 1.812 ms, min = 10.977 us, total = 66.448 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 180 total (0 active), Execution time: mean = 1.486 ms, total = 267.417 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 117 total (21 active), Execution time: mean = 7.058 us, total = 825.818 us, Queueing time: mean = 67.182 s, max = 470.895 s, min = 23.644 us, total = 7860.242 s [state-dump] ClientConnection.async_read.ProcessMessage - 96 total (0 active), Execution time: mean = 794.794 us, total = 76.300 ms, Queueing time: mean = 19.696 us, max = 164.240 us, min = 2.397 us, total = 1.891 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 90 total (1 active), Execution time: mean = 1.762 ms, total = 158.574 ms, Queueing time: mean = 61.508 us, max = 130.592 us, min = 16.678 us, total = 5.536 ms [state-dump] - 34 total (0 active), Execution time: mean = 889.353 ns, total = 30.238 us, Queueing time: mean = 73.833 us, max = 186.936 us, min = 20.527 us, total = 2.510 ms [state-dump] RaySyncer.BroadcastMessage - 34 total (0 active), Execution time: mean = 202.349 us, total = 6.880 ms, Queueing time: mean = 632.971 ns, max = 924.000 ns, min = 91.000 ns, total = 21.521 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 30 total (0 active), Execution time: mean = 556.684 us, total = 16.701 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 30 total (0 active), Execution time: mean = 39.558 us, total = 1.187 ms, Queueing time: mean = 224.723 us, max = 539.776 us, min = 15.433 us, total = 6.742 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 30 total (0 active), Execution time: mean = 88.582 us, total = 2.657 ms, Queueing time: mean = 226.281 us, max = 674.029 us, min = 19.147 us, total = 6.788 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 30 total (0 active), Execution time: mean = 959.639 us, total = 28.789 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 30 total (0 active), Execution time: mean = 105.102 us, total = 3.153 ms, Queueing time: mean = 87.315 us, max = 165.892 us, min = 19.400 us, total = 2.619 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 15 total (1 active, 1 running), Execution time: mean = 2.541 ms, total = 38.112 ms, Queueing time: mean = 57.466 us, max = 150.858 us, min = 13.745 us, total = 861.992 us [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.546 s, total = 598.639 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.612 us, total = 9.223 us, Queueing time: mean = 43.822 us, max = 87.645 us, min = 87.645 us, total = 87.645 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 390.843 us, total = 781.686 us, Queueing time: mean = 104.124 us, max = 137.867 us, min = 70.382 us, total = 208.249 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 22:01:50,436 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:01:50,504 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 84258 total (35 active) [state-dump] Queueing time: mean = 93.360 ms, max = 470.895 s, min = -0.000 s, total = 7866.323 s [state-dump] Execution time: mean = 7.277 ms, total = 613.131 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 20139 total (0 active), Execution time: mean = 34.973 us, total = 704.327 ms, Queueing time: mean = 98.014 us, max = 3.225 ms, min = 2.778 us, total = 1.974 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 20139 total (0 active), Execution time: mean = 504.815 us, total = 10.166 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 9592 total (1 active), Execution time: mean = 11.300 us, total = 108.387 ms, Queueing time: mean = 85.824 us, max = 4.610 ms, min = 7.347 us, total = 823.227 ms [state-dump] NodeManager.CheckGC - 9592 total (1 active), Execution time: mean = 3.093 us, total = 29.668 ms, Queueing time: mean = 93.007 us, max = 4.613 ms, min = 3.386 us, total = 892.121 ms [state-dump] ObjectManager.UpdateAvailableMemory - 9591 total (0 active), Execution time: mean = 5.518 us, total = 52.924 ms, Queueing time: mean = 96.860 us, max = 730.033 us, min = 2.098 us, total = 928.986 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 4799 total (1 active), Execution time: mean = 18.049 us, total = 86.615 ms, Queueing time: mean = 74.223 us, max = 13.722 ms, min = 9.036 us, total = 356.198 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 3833 total (1 active), Execution time: mean = 449.289 us, total = 1.722 s, Queueing time: mean = 71.435 us, max = 978.705 us, min = -0.000 s, total = 273.809 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 960 total (1 active), Execution time: mean = 8.565 us, total = 8.222 ms, Queueing time: mean = 176.952 us, max = 2.380 ms, min = 9.000 us, total = 169.874 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 960 total (1 active), Execution time: mean = 14.908 us, total = 14.312 ms, Queueing time: mean = 59.594 us, max = 2.582 ms, min = 7.069 us, total = 57.210 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 960 total (1 active), Execution time: mean = 2.805 us, total = 2.693 ms, Queueing time: mean = 180.776 us, max = 2.379 ms, min = 6.061 us, total = 173.545 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 959 total (0 active), Execution time: mean = 107.314 us, total = 102.914 ms, Queueing time: mean = 104.020 us, max = 1.188 ms, min = 13.994 us, total = 99.755 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 959 total (0 active), Execution time: mean = 636.008 us, total = 609.932 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 321 total (1 active), Execution time: mean = 8.671 us, total = 2.784 ms, Queueing time: mean = 64.793 us, max = 188.727 us, min = 9.881 us, total = 20.798 ms [state-dump] NodeManager.GcsCheckAlive - 192 total (1 active), Execution time: mean = 296.010 us, total = 56.834 ms, Queueing time: mean = 615.490 us, max = 2.263 ms, min = 12.935 us, total = 118.174 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 192 total (0 active), Execution time: mean = 51.020 us, total = 9.796 ms, Queueing time: mean = 98.038 us, max = 307.469 us, min = 11.913 us, total = 18.823 ms [state-dump] NodeManager.deadline_timer.record_metrics - 192 total (1 active), Execution time: mean = 544.354 us, total = 104.516 ms, Queueing time: mean = 369.568 us, max = 1.812 ms, min = 10.977 us, total = 70.957 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 192 total (0 active), Execution time: mean = 1.500 ms, total = 287.949 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 117 total (21 active), Execution time: mean = 7.058 us, total = 825.818 us, Queueing time: mean = 67.182 s, max = 470.895 s, min = 23.644 us, total = 7860.242 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 96 total (1 active), Execution time: mean = 1.765 ms, total = 169.452 ms, Queueing time: mean = 61.436 us, max = 130.592 us, min = 16.678 us, total = 5.898 ms [state-dump] ClientConnection.async_read.ProcessMessage - 96 total (0 active), Execution time: mean = 794.794 us, total = 76.300 ms, Queueing time: mean = 19.696 us, max = 164.240 us, min = 2.397 us, total = 1.891 ms [state-dump] - 34 total (0 active), Execution time: mean = 889.353 ns, total = 30.238 us, Queueing time: mean = 73.833 us, max = 186.936 us, min = 20.527 us, total = 2.510 ms [state-dump] RaySyncer.BroadcastMessage - 34 total (0 active), Execution time: mean = 202.349 us, total = 6.880 ms, Queueing time: mean = 632.971 ns, max = 924.000 ns, min = 91.000 ns, total = 21.521 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 30 total (0 active), Execution time: mean = 556.684 us, total = 16.701 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 30 total (0 active), Execution time: mean = 39.558 us, total = 1.187 ms, Queueing time: mean = 224.723 us, max = 539.776 us, min = 15.433 us, total = 6.742 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 30 total (0 active), Execution time: mean = 88.582 us, total = 2.657 ms, Queueing time: mean = 226.281 us, max = 674.029 us, min = 19.147 us, total = 6.788 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 30 total (0 active), Execution time: mean = 959.639 us, total = 28.789 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 30 total (0 active), Execution time: mean = 105.102 us, total = 3.153 ms, Queueing time: mean = 87.315 us, max = 165.892 us, min = 19.400 us, total = 2.619 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 16 total (1 active, 1 running), Execution time: mean = 2.581 ms, total = 41.304 ms, Queueing time: mean = 57.898 us, max = 150.858 us, min = 13.745 us, total = 926.372 us [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.546 s, total = 598.639 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.612 us, total = 9.223 us, Queueing time: mean = 43.822 us, max = 87.645 us, min = 87.645 us, total = 87.645 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 390.843 us, total = 781.686 us, Queueing time: mean = 104.124 us, max = 137.867 us, min = 70.382 us, total = 208.249 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:02:50,437 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:02:50,508 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 89488 total (35 active) [state-dump] Queueing time: mean = 87.909 ms, max = 470.895 s, min = -0.000 s, total = 7866.765 s [state-dump] Execution time: mean = 6.863 ms, total = 614.123 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 21399 total (0 active), Execution time: mean = 35.458 us, total = 758.757 ms, Queueing time: mean = 99.386 us, max = 3.225 ms, min = 2.778 us, total = 2.127 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 21399 total (0 active), Execution time: mean = 507.967 us, total = 10.870 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 10191 total (1 active), Execution time: mean = 11.388 us, total = 116.051 ms, Queueing time: mean = 86.544 us, max = 4.610 ms, min = 7.347 us, total = 881.966 ms [state-dump] NodeManager.CheckGC - 10191 total (1 active), Execution time: mean = 3.107 us, total = 31.668 ms, Queueing time: mean = 93.789 us, max = 4.613 ms, min = 3.386 us, total = 955.808 ms [state-dump] ObjectManager.UpdateAvailableMemory - 10190 total (0 active), Execution time: mean = 5.599 us, total = 57.059 ms, Queueing time: mean = 98.065 us, max = 730.033 us, min = 2.098 us, total = 999.286 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 5098 total (1 active), Execution time: mean = 18.377 us, total = 93.686 ms, Queueing time: mean = 74.679 us, max = 13.722 ms, min = 9.036 us, total = 380.714 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 4072 total (1 active), Execution time: mean = 451.150 us, total = 1.837 s, Queueing time: mean = 72.332 us, max = 978.705 us, min = -0.000 s, total = 294.536 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1020 total (1 active), Execution time: mean = 8.678 us, total = 8.851 ms, Queueing time: mean = 177.670 us, max = 2.380 ms, min = 9.000 us, total = 181.223 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1020 total (1 active), Execution time: mean = 15.128 us, total = 15.430 ms, Queueing time: mean = 60.022 us, max = 2.582 ms, min = 7.069 us, total = 61.223 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1020 total (1 active), Execution time: mean = 2.840 us, total = 2.897 ms, Queueing time: mean = 181.533 us, max = 2.379 ms, min = 6.061 us, total = 185.164 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1019 total (0 active), Execution time: mean = 108.082 us, total = 110.136 ms, Queueing time: mean = 105.218 us, max = 1.188 ms, min = 13.994 us, total = 107.217 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1019 total (0 active), Execution time: mean = 641.023 us, total = 653.202 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 341 total (1 active), Execution time: mean = 8.783 us, total = 2.995 ms, Queueing time: mean = 66.802 us, max = 363.446 us, min = 9.881 us, total = 22.780 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 204 total (0 active), Execution time: mean = 1.510 ms, total = 308.124 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 204 total (1 active), Execution time: mean = 544.785 us, total = 111.136 ms, Queueing time: mean = 373.048 us, max = 1.812 ms, min = 10.977 us, total = 76.102 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 204 total (0 active), Execution time: mean = 51.735 us, total = 10.554 ms, Queueing time: mean = 98.777 us, max = 307.469 us, min = 11.913 us, total = 20.151 ms [state-dump] NodeManager.GcsCheckAlive - 204 total (1 active), Execution time: mean = 299.447 us, total = 61.087 ms, Queueing time: mean = 616.285 us, max = 2.263 ms, min = 12.935 us, total = 125.722 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 117 total (21 active), Execution time: mean = 7.058 us, total = 825.818 us, Queueing time: mean = 67.182 s, max = 470.895 s, min = 23.644 us, total = 7860.242 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 102 total (1 active), Execution time: mean = 1.773 ms, total = 180.867 ms, Queueing time: mean = 64.117 us, max = 148.355 us, min = 16.678 us, total = 6.540 ms [state-dump] ClientConnection.async_read.ProcessMessage - 96 total (0 active), Execution time: mean = 794.794 us, total = 76.300 ms, Queueing time: mean = 19.696 us, max = 164.240 us, min = 2.397 us, total = 1.891 ms [state-dump] RaySyncer.BroadcastMessage - 34 total (0 active), Execution time: mean = 202.349 us, total = 6.880 ms, Queueing time: mean = 632.971 ns, max = 924.000 ns, min = 91.000 ns, total = 21.521 us [state-dump] - 34 total (0 active), Execution time: mean = 889.353 ns, total = 30.238 us, Queueing time: mean = 73.833 us, max = 186.936 us, min = 20.527 us, total = 2.510 ms [state-dump] WorkerPool.PopWorkerCallback - 30 total (0 active), Execution time: mean = 39.558 us, total = 1.187 ms, Queueing time: mean = 224.723 us, max = 539.776 us, min = 15.433 us, total = 6.742 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 30 total (0 active), Execution time: mean = 105.102 us, total = 3.153 ms, Queueing time: mean = 87.315 us, max = 165.892 us, min = 19.400 us, total = 2.619 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 30 total (0 active), Execution time: mean = 959.639 us, total = 28.789 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 30 total (0 active), Execution time: mean = 88.582 us, total = 2.657 ms, Queueing time: mean = 226.281 us, max = 674.029 us, min = 19.147 us, total = 6.788 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 30 total (0 active), Execution time: mean = 556.684 us, total = 16.701 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 17 total (1 active, 1 running), Execution time: mean = 2.589 ms, total = 44.007 ms, Queueing time: mean = 59.098 us, max = 150.858 us, min = 13.745 us, total = 1.005 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.546 s, total = 598.639 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.612 us, total = 9.223 us, Queueing time: mean = 43.822 us, max = 87.645 us, min = 87.645 us, total = 87.645 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 390.843 us, total = 781.686 us, Queueing time: mean = 104.124 us, max = 137.867 us, min = 70.382 us, total = 208.249 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 22:03:50,437 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:03:50,510 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 94720 total (35 active) [state-dump] Queueing time: mean = 83.057 ms, max = 470.895 s, min = -0.000 s, total = 7867.200 s [state-dump] Execution time: mean = 6.494 ms, total = 615.110 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 22659 total (0 active), Execution time: mean = 35.791 us, total = 810.984 ms, Queueing time: mean = 100.426 us, max = 3.225 ms, min = 2.778 us, total = 2.276 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 22659 total (0 active), Execution time: mean = 510.803 us, total = 11.574 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 10790 total (1 active), Execution time: mean = 11.460 us, total = 123.653 ms, Queueing time: mean = 87.324 us, max = 4.610 ms, min = 7.347 us, total = 942.228 ms [state-dump] NodeManager.CheckGC - 10790 total (1 active), Execution time: mean = 3.119 us, total = 33.655 ms, Queueing time: mean = 94.624 us, max = 4.613 ms, min = 3.386 us, total = 1.021 s [state-dump] ObjectManager.UpdateAvailableMemory - 10789 total (0 active), Execution time: mean = 5.654 us, total = 61.002 ms, Queueing time: mean = 98.653 us, max = 730.033 us, min = 2.098 us, total = 1.064 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 5398 total (1 active), Execution time: mean = 18.546 us, total = 100.112 ms, Queueing time: mean = 75.037 us, max = 13.722 ms, min = 9.036 us, total = 405.051 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 4312 total (1 active), Execution time: mean = 452.538 us, total = 1.951 s, Queueing time: mean = 72.974 us, max = 978.705 us, min = -0.000 s, total = 314.663 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1080 total (1 active), Execution time: mean = 8.760 us, total = 9.461 ms, Queueing time: mean = 178.611 us, max = 2.380 ms, min = 9.000 us, total = 192.900 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1080 total (1 active), Execution time: mean = 15.268 us, total = 16.489 ms, Queueing time: mean = 60.292 us, max = 2.582 ms, min = 7.069 us, total = 65.115 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1080 total (1 active), Execution time: mean = 2.853 us, total = 3.081 ms, Queueing time: mean = 182.513 us, max = 2.379 ms, min = 6.061 us, total = 197.114 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1079 total (0 active), Execution time: mean = 108.244 us, total = 116.795 ms, Queueing time: mean = 105.796 us, max = 1.188 ms, min = 13.994 us, total = 114.154 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1079 total (0 active), Execution time: mean = 643.481 us, total = 694.316 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 361 total (1 active), Execution time: mean = 8.883 us, total = 3.207 ms, Queueing time: mean = 67.505 us, max = 363.446 us, min = 9.881 us, total = 24.369 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 216 total (0 active), Execution time: mean = 1.519 ms, total = 328.057 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 216 total (1 active), Execution time: mean = 546.211 us, total = 117.982 ms, Queueing time: mean = 376.358 us, max = 1.812 ms, min = 10.977 us, total = 81.293 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 216 total (0 active), Execution time: mean = 52.302 us, total = 11.297 ms, Queueing time: mean = 99.698 us, max = 307.469 us, min = 11.913 us, total = 21.535 ms [state-dump] NodeManager.GcsCheckAlive - 216 total (1 active), Execution time: mean = 303.324 us, total = 65.518 ms, Queueing time: mean = 617.663 us, max = 2.263 ms, min = 12.935 us, total = 133.415 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 117 total (21 active), Execution time: mean = 7.058 us, total = 825.818 us, Queueing time: mean = 67.182 s, max = 470.895 s, min = 23.644 us, total = 7860.242 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 108 total (1 active), Execution time: mean = 1.781 ms, total = 192.329 ms, Queueing time: mean = 65.225 us, max = 148.355 us, min = 16.678 us, total = 7.044 ms [state-dump] ClientConnection.async_read.ProcessMessage - 96 total (0 active), Execution time: mean = 794.794 us, total = 76.300 ms, Queueing time: mean = 19.696 us, max = 164.240 us, min = 2.397 us, total = 1.891 ms [state-dump] RaySyncer.BroadcastMessage - 34 total (0 active), Execution time: mean = 202.349 us, total = 6.880 ms, Queueing time: mean = 632.971 ns, max = 924.000 ns, min = 91.000 ns, total = 21.521 us [state-dump] - 34 total (0 active), Execution time: mean = 889.353 ns, total = 30.238 us, Queueing time: mean = 73.833 us, max = 186.936 us, min = 20.527 us, total = 2.510 ms [state-dump] WorkerPool.PopWorkerCallback - 30 total (0 active), Execution time: mean = 39.558 us, total = 1.187 ms, Queueing time: mean = 224.723 us, max = 539.776 us, min = 15.433 us, total = 6.742 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 30 total (0 active), Execution time: mean = 105.102 us, total = 3.153 ms, Queueing time: mean = 87.315 us, max = 165.892 us, min = 19.400 us, total = 2.619 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 30 total (0 active), Execution time: mean = 959.639 us, total = 28.789 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 30 total (0 active), Execution time: mean = 88.582 us, total = 2.657 ms, Queueing time: mean = 226.281 us, max = 674.029 us, min = 19.147 us, total = 6.788 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 30 total (0 active), Execution time: mean = 556.684 us, total = 16.701 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 18 total (1 active, 1 running), Execution time: mean = 2.609 ms, total = 46.967 ms, Queueing time: mean = 60.201 us, max = 150.858 us, min = 13.745 us, total = 1.084 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.546 s, total = 598.639 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.612 us, total = 9.223 us, Queueing time: mean = 43.822 us, max = 87.645 us, min = 87.645 us, total = 87.645 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 390.843 us, total = 781.686 us, Queueing time: mean = 104.124 us, max = 137.867 us, min = 70.382 us, total = 208.249 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 22:04:50,437 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:04:50,513 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 99954 total (35 active) [state-dump] Queueing time: mean = 78.712 ms, max = 470.895 s, min = -0.000 s, total = 7867.625 s [state-dump] Execution time: mean = 6.164 ms, total = 616.079 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 23919 total (0 active), Execution time: mean = 36.035 us, total = 861.928 ms, Queueing time: mean = 101.133 us, max = 3.225 ms, min = 2.778 us, total = 2.419 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 23919 total (0 active), Execution time: mean = 512.854 us, total = 12.267 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 11390 total (1 active), Execution time: mean = 11.428 us, total = 130.165 ms, Queueing time: mean = 87.821 us, max = 4.610 ms, min = 7.347 us, total = 1.000 s [state-dump] NodeManager.CheckGC - 11390 total (1 active), Execution time: mean = 3.127 us, total = 35.617 ms, Queueing time: mean = 95.087 us, max = 4.613 ms, min = 3.386 us, total = 1.083 s [state-dump] ObjectManager.UpdateAvailableMemory - 11389 total (0 active), Execution time: mean = 5.678 us, total = 64.672 ms, Queueing time: mean = 99.678 us, max = 730.033 us, min = 2.098 us, total = 1.135 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 5698 total (1 active), Execution time: mean = 18.530 us, total = 105.584 ms, Queueing time: mean = 75.370 us, max = 13.722 ms, min = 9.036 us, total = 429.461 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 4551 total (1 active), Execution time: mean = 453.010 us, total = 2.062 s, Queueing time: mean = 73.300 us, max = 978.705 us, min = -0.000 s, total = 333.587 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1140 total (1 active), Execution time: mean = 8.762 us, total = 9.988 ms, Queueing time: mean = 178.428 us, max = 2.380 ms, min = 8.211 us, total = 203.408 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1140 total (1 active), Execution time: mean = 15.244 us, total = 17.378 ms, Queueing time: mean = 60.762 us, max = 2.582 ms, min = 7.069 us, total = 69.269 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1140 total (1 active), Execution time: mean = 2.853 us, total = 3.252 ms, Queueing time: mean = 182.333 us, max = 2.379 ms, min = 4.508 us, total = 207.860 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1139 total (0 active), Execution time: mean = 109.434 us, total = 124.646 ms, Queueing time: mean = 106.807 us, max = 1.188 ms, min = 13.994 us, total = 121.653 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1139 total (0 active), Execution time: mean = 647.634 us, total = 737.655 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 381 total (1 active), Execution time: mean = 8.889 us, total = 3.387 ms, Queueing time: mean = 67.832 us, max = 363.446 us, min = 9.881 us, total = 25.844 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 228 total (0 active), Execution time: mean = 1.528 ms, total = 348.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 228 total (1 active), Execution time: mean = 548.710 us, total = 125.106 ms, Queueing time: mean = 373.677 us, max = 1.812 ms, min = 10.977 us, total = 85.198 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 228 total (0 active), Execution time: mean = 52.522 us, total = 11.975 ms, Queueing time: mean = 100.008 us, max = 307.469 us, min = 11.913 us, total = 22.802 ms [state-dump] NodeManager.GcsCheckAlive - 228 total (1 active), Execution time: mean = 302.352 us, total = 68.936 ms, Queueing time: mean = 618.307 us, max = 2.263 ms, min = 12.935 us, total = 140.974 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 117 total (21 active), Execution time: mean = 7.058 us, total = 825.818 us, Queueing time: mean = 67.182 s, max = 470.895 s, min = 23.644 us, total = 7860.242 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 114 total (1 active), Execution time: mean = 1.778 ms, total = 202.661 ms, Queueing time: mean = 65.404 us, max = 148.355 us, min = 16.678 us, total = 7.456 ms [state-dump] ClientConnection.async_read.ProcessMessage - 96 total (0 active), Execution time: mean = 794.794 us, total = 76.300 ms, Queueing time: mean = 19.696 us, max = 164.240 us, min = 2.397 us, total = 1.891 ms [state-dump] RaySyncer.BroadcastMessage - 34 total (0 active), Execution time: mean = 202.349 us, total = 6.880 ms, Queueing time: mean = 632.971 ns, max = 924.000 ns, min = 91.000 ns, total = 21.521 us [state-dump] - 34 total (0 active), Execution time: mean = 889.353 ns, total = 30.238 us, Queueing time: mean = 73.833 us, max = 186.936 us, min = 20.527 us, total = 2.510 ms [state-dump] WorkerPool.PopWorkerCallback - 30 total (0 active), Execution time: mean = 39.558 us, total = 1.187 ms, Queueing time: mean = 224.723 us, max = 539.776 us, min = 15.433 us, total = 6.742 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 30 total (0 active), Execution time: mean = 105.102 us, total = 3.153 ms, Queueing time: mean = 87.315 us, max = 165.892 us, min = 19.400 us, total = 2.619 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 30 total (0 active), Execution time: mean = 959.639 us, total = 28.789 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 30 total (0 active), Execution time: mean = 88.582 us, total = 2.657 ms, Queueing time: mean = 226.281 us, max = 674.029 us, min = 19.147 us, total = 6.788 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 30 total (0 active), Execution time: mean = 556.684 us, total = 16.701 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 19 total (1 active, 1 running), Execution time: mean = 2.613 ms, total = 49.638 ms, Queueing time: mean = 60.330 us, max = 150.858 us, min = 13.745 us, total = 1.146 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.546 s, total = 598.639 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.612 us, total = 9.223 us, Queueing time: mean = 43.822 us, max = 87.645 us, min = 87.645 us, total = 87.645 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 390.843 us, total = 781.686 us, Queueing time: mean = 104.124 us, max = 137.867 us, min = 70.382 us, total = 208.249 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 22:05:50,438 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:05:50,515 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 105188 total (35 active) [state-dump] Queueing time: mean = 74.800 ms, max = 470.895 s, min = -0.000 s, total = 7868.056 s [state-dump] Execution time: mean = 11.570 ms, total = 1217.069 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 25179 total (0 active), Execution time: mean = 36.278 us, total = 913.447 ms, Queueing time: mean = 101.924 us, max = 3.225 ms, min = 2.778 us, total = 2.566 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 25179 total (0 active), Execution time: mean = 515.334 us, total = 12.976 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 11989 total (1 active), Execution time: mean = 3.135 us, total = 37.581 ms, Queueing time: mean = 95.381 us, max = 4.613 ms, min = 3.386 us, total = 1.144 s [state-dump] RaySyncer.OnDemandBroadcasting - 11989 total (1 active), Execution time: mean = 11.432 us, total = 137.054 ms, Queueing time: mean = 88.125 us, max = 4.610 ms, min = 7.347 us, total = 1.057 s [state-dump] ObjectManager.UpdateAvailableMemory - 11988 total (0 active), Execution time: mean = 5.725 us, total = 68.629 ms, Queueing time: mean = 100.824 us, max = 730.033 us, min = 2.098 us, total = 1.209 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 5998 total (1 active), Execution time: mean = 18.678 us, total = 112.030 ms, Queueing time: mean = 75.770 us, max = 13.722 ms, min = 9.036 us, total = 454.466 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 4791 total (1 active), Execution time: mean = 453.592 us, total = 2.173 s, Queueing time: mean = 73.692 us, max = 978.705 us, min = -0.000 s, total = 353.057 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1200 total (1 active), Execution time: mean = 15.344 us, total = 18.413 ms, Queueing time: mean = 61.784 us, max = 2.582 ms, min = 7.069 us, total = 74.140 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1200 total (1 active), Execution time: mean = 2.863 us, total = 3.435 ms, Queueing time: mean = 182.288 us, max = 2.379 ms, min = 4.508 us, total = 218.745 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1200 total (1 active), Execution time: mean = 8.865 us, total = 10.638 ms, Queueing time: mean = 178.343 us, max = 2.380 ms, min = 8.211 us, total = 214.012 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1199 total (0 active), Execution time: mean = 109.695 us, total = 131.525 ms, Queueing time: mean = 107.581 us, max = 1.188 ms, min = 13.994 us, total = 128.990 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1199 total (0 active), Execution time: mean = 650.129 us, total = 779.505 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 401 total (1 active), Execution time: mean = 8.946 us, total = 3.587 ms, Queueing time: mean = 68.617 us, max = 363.446 us, min = 9.881 us, total = 27.516 ms [state-dump] NodeManager.GcsCheckAlive - 240 total (1 active), Execution time: mean = 304.053 us, total = 72.973 ms, Queueing time: mean = 616.985 us, max = 2.263 ms, min = 12.935 us, total = 148.076 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 240 total (0 active), Execution time: mean = 52.858 us, total = 12.686 ms, Queueing time: mean = 100.967 us, max = 307.469 us, min = 11.913 us, total = 24.232 ms [state-dump] NodeManager.deadline_timer.record_metrics - 240 total (1 active), Execution time: mean = 550.913 us, total = 132.219 ms, Queueing time: mean = 371.944 us, max = 1.812 ms, min = 10.977 us, total = 89.266 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 240 total (0 active), Execution time: mean = 1.538 ms, total = 369.100 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 120 total (1 active), Execution time: mean = 1.779 ms, total = 213.422 ms, Queueing time: mean = 65.177 us, max = 148.355 us, min = 16.678 us, total = 7.821 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 117 total (21 active), Execution time: mean = 7.058 us, total = 825.818 us, Queueing time: mean = 67.182 s, max = 470.895 s, min = 23.644 us, total = 7860.242 s [state-dump] ClientConnection.async_read.ProcessMessage - 96 total (0 active), Execution time: mean = 794.794 us, total = 76.300 ms, Queueing time: mean = 19.696 us, max = 164.240 us, min = 2.397 us, total = 1.891 ms [state-dump] RaySyncer.BroadcastMessage - 34 total (0 active), Execution time: mean = 202.349 us, total = 6.880 ms, Queueing time: mean = 632.971 ns, max = 924.000 ns, min = 91.000 ns, total = 21.521 us [state-dump] - 34 total (0 active), Execution time: mean = 889.353 ns, total = 30.238 us, Queueing time: mean = 73.833 us, max = 186.936 us, min = 20.527 us, total = 2.510 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 30 total (0 active), Execution time: mean = 556.684 us, total = 16.701 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 30 total (0 active), Execution time: mean = 105.102 us, total = 3.153 ms, Queueing time: mean = 87.315 us, max = 165.892 us, min = 19.400 us, total = 2.619 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 30 total (0 active), Execution time: mean = 959.639 us, total = 28.789 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 30 total (0 active), Execution time: mean = 39.558 us, total = 1.187 ms, Queueing time: mean = 224.723 us, max = 539.776 us, min = 15.433 us, total = 6.742 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 30 total (0 active), Execution time: mean = 88.582 us, total = 2.657 ms, Queueing time: mean = 226.281 us, max = 674.029 us, min = 19.147 us, total = 6.788 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 20 total (1 active, 1 running), Execution time: mean = 2.632 ms, total = 52.642 ms, Queueing time: mean = 62.493 us, max = 150.858 us, min = 13.745 us, total = 1.250 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.660 s, total = 1198.640 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 377.345 us, total = 1.132 ms, Queueing time: mean = 149.043 us, max = 238.879 us, min = 70.382 us, total = 447.128 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.612 us, total = 9.223 us, Queueing time: mean = 43.822 us, max = 87.645 us, min = 87.645 us, total = 87.645 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:06:50,438 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:06:50,518 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 110422 total (35 active) [state-dump] Queueing time: mean = 71.258 ms, max = 470.895 s, min = -0.000 s, total = 7868.476 s [state-dump] Execution time: mean = 11.031 ms, total = 1218.027 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 26439 total (0 active), Execution time: mean = 36.450 us, total = 963.693 ms, Queueing time: mean = 102.504 us, max = 3.225 ms, min = 2.778 us, total = 2.710 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 26439 total (0 active), Execution time: mean = 516.817 us, total = 13.664 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 12589 total (1 active), Execution time: mean = 3.150 us, total = 39.653 ms, Queueing time: mean = 95.848 us, max = 4.613 ms, min = 3.386 us, total = 1.207 s [state-dump] RaySyncer.OnDemandBroadcasting - 12589 total (1 active), Execution time: mean = 11.457 us, total = 144.234 ms, Queueing time: mean = 88.577 us, max = 4.610 ms, min = 7.347 us, total = 1.115 s [state-dump] ObjectManager.UpdateAvailableMemory - 12588 total (0 active), Execution time: mean = 5.761 us, total = 72.523 ms, Queueing time: mean = 101.463 us, max = 730.033 us, min = 2.098 us, total = 1.277 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 6298 total (1 active), Execution time: mean = 18.824 us, total = 118.553 ms, Queueing time: mean = 76.130 us, max = 13.722 ms, min = 9.036 us, total = 479.465 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 5030 total (1 active), Execution time: mean = 453.992 us, total = 2.284 s, Queueing time: mean = 73.960 us, max = 978.705 us, min = -0.000 s, total = 372.017 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1260 total (1 active), Execution time: mean = 15.407 us, total = 19.413 ms, Queueing time: mean = 62.315 us, max = 2.582 ms, min = 7.069 us, total = 78.517 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1260 total (1 active), Execution time: mean = 2.879 us, total = 3.627 ms, Queueing time: mean = 181.499 us, max = 2.379 ms, min = 4.508 us, total = 228.689 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1260 total (1 active), Execution time: mean = 8.921 us, total = 11.240 ms, Queueing time: mean = 177.535 us, max = 2.380 ms, min = 160.000 ns, total = 223.694 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1259 total (0 active), Execution time: mean = 109.400 us, total = 137.735 ms, Queueing time: mean = 107.713 us, max = 1.188 ms, min = 13.994 us, total = 135.611 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1259 total (0 active), Execution time: mean = 648.975 us, total = 817.060 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 421 total (1 active), Execution time: mean = 8.945 us, total = 3.766 ms, Queueing time: mean = 68.690 us, max = 363.446 us, min = 9.881 us, total = 28.918 ms [state-dump] NodeManager.GcsCheckAlive - 252 total (1 active), Execution time: mean = 305.152 us, total = 76.898 ms, Queueing time: mean = 611.575 us, max = 2.263 ms, min = 12.935 us, total = 154.117 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 252 total (0 active), Execution time: mean = 53.025 us, total = 13.362 ms, Queueing time: mean = 100.590 us, max = 307.469 us, min = 11.913 us, total = 25.349 ms [state-dump] NodeManager.deadline_timer.record_metrics - 252 total (1 active), Execution time: mean = 551.571 us, total = 138.996 ms, Queueing time: mean = 366.818 us, max = 1.812 ms, min = 10.977 us, total = 92.438 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 252 total (0 active), Execution time: mean = 1.545 ms, total = 389.451 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 126 total (1 active), Execution time: mean = 1.772 ms, total = 223.233 ms, Queueing time: mean = 64.918 us, max = 148.355 us, min = 16.678 us, total = 8.180 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 117 total (21 active), Execution time: mean = 7.058 us, total = 825.818 us, Queueing time: mean = 67.182 s, max = 470.895 s, min = 23.644 us, total = 7860.242 s [state-dump] ClientConnection.async_read.ProcessMessage - 96 total (0 active), Execution time: mean = 794.794 us, total = 76.300 ms, Queueing time: mean = 19.696 us, max = 164.240 us, min = 2.397 us, total = 1.891 ms [state-dump] RaySyncer.BroadcastMessage - 34 total (0 active), Execution time: mean = 202.349 us, total = 6.880 ms, Queueing time: mean = 632.971 ns, max = 924.000 ns, min = 91.000 ns, total = 21.521 us [state-dump] - 34 total (0 active), Execution time: mean = 889.353 ns, total = 30.238 us, Queueing time: mean = 73.833 us, max = 186.936 us, min = 20.527 us, total = 2.510 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 30 total (0 active), Execution time: mean = 556.684 us, total = 16.701 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 30 total (0 active), Execution time: mean = 105.102 us, total = 3.153 ms, Queueing time: mean = 87.315 us, max = 165.892 us, min = 19.400 us, total = 2.619 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 30 total (0 active), Execution time: mean = 959.639 us, total = 28.789 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 30 total (0 active), Execution time: mean = 39.558 us, total = 1.187 ms, Queueing time: mean = 224.723 us, max = 539.776 us, min = 15.433 us, total = 6.742 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 30 total (0 active), Execution time: mean = 88.582 us, total = 2.657 ms, Queueing time: mean = 226.281 us, max = 674.029 us, min = 19.147 us, total = 6.788 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 21 total (1 active, 1 running), Execution time: mean = 2.602 ms, total = 54.634 ms, Queueing time: mean = 62.628 us, max = 150.858 us, min = 13.745 us, total = 1.315 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.660 s, total = 1198.640 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 377.345 us, total = 1.132 ms, Queueing time: mean = 149.043 us, max = 238.879 us, min = 70.382 us, total = 447.128 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.612 us, total = 9.223 us, Queueing time: mean = 43.822 us, max = 87.645 us, min = 87.645 us, total = 87.645 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 22:07:50,438 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:07:50,521 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 115654 total (35 active) [state-dump] Queueing time: mean = 68.038 ms, max = 470.895 s, min = -0.000 s, total = 7868.910 s [state-dump] Execution time: mean = 10.540 ms, total = 1219.005 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 27699 total (0 active), Execution time: mean = 36.706 us, total = 1.017 s, Queueing time: mean = 103.084 us, max = 3.225 ms, min = 2.778 us, total = 2.855 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 27699 total (0 active), Execution time: mean = 518.318 us, total = 14.357 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 13188 total (1 active), Execution time: mean = 3.165 us, total = 41.735 ms, Queueing time: mean = 96.348 us, max = 4.613 ms, min = 3.386 us, total = 1.271 s [state-dump] RaySyncer.OnDemandBroadcasting - 13188 total (1 active), Execution time: mean = 11.531 us, total = 152.076 ms, Queueing time: mean = 89.021 us, max = 4.610 ms, min = 7.347 us, total = 1.174 s [state-dump] ObjectManager.UpdateAvailableMemory - 13187 total (0 active), Execution time: mean = 5.818 us, total = 76.717 ms, Queueing time: mean = 102.155 us, max = 730.033 us, min = 2.098 us, total = 1.347 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 6598 total (1 active), Execution time: mean = 18.992 us, total = 125.307 ms, Queueing time: mean = 76.339 us, max = 13.722 ms, min = 9.036 us, total = 503.687 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 5270 total (1 active), Execution time: mean = 455.071 us, total = 2.398 s, Queueing time: mean = 74.385 us, max = 978.705 us, min = -0.000 s, total = 392.006 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1320 total (1 active), Execution time: mean = 15.501 us, total = 20.462 ms, Queueing time: mean = 62.756 us, max = 2.582 ms, min = 7.069 us, total = 82.838 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1320 total (1 active), Execution time: mean = 2.898 us, total = 3.825 ms, Queueing time: mean = 182.055 us, max = 2.379 ms, min = 4.508 us, total = 240.312 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1320 total (1 active), Execution time: mean = 9.020 us, total = 11.906 ms, Queueing time: mean = 178.052 us, max = 2.380 ms, min = 160.000 ns, total = 235.028 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1319 total (0 active), Execution time: mean = 109.887 us, total = 144.941 ms, Queueing time: mean = 108.567 us, max = 1.188 ms, min = 13.994 us, total = 143.200 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1319 total (0 active), Execution time: mean = 650.758 us, total = 858.350 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 441 total (1 active), Execution time: mean = 9.049 us, total = 3.991 ms, Queueing time: mean = 69.524 us, max = 363.446 us, min = 9.881 us, total = 30.660 ms [state-dump] NodeManager.GcsCheckAlive - 264 total (1 active), Execution time: mean = 307.459 us, total = 81.169 ms, Queueing time: mean = 612.507 us, max = 2.263 ms, min = 12.935 us, total = 161.702 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 264 total (0 active), Execution time: mean = 53.312 us, total = 14.074 ms, Queueing time: mean = 100.772 us, max = 307.469 us, min = 11.913 us, total = 26.604 ms [state-dump] NodeManager.deadline_timer.record_metrics - 264 total (1 active), Execution time: mean = 552.919 us, total = 145.971 ms, Queueing time: mean = 368.722 us, max = 1.812 ms, min = 10.977 us, total = 97.343 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 264 total (0 active), Execution time: mean = 1.548 ms, total = 408.633 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 132 total (1 active), Execution time: mean = 1.778 ms, total = 234.647 ms, Queueing time: mean = 65.832 us, max = 148.355 us, min = 16.678 us, total = 8.690 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 117 total (21 active), Execution time: mean = 7.058 us, total = 825.818 us, Queueing time: mean = 67.182 s, max = 470.895 s, min = 23.644 us, total = 7860.242 s [state-dump] ClientConnection.async_read.ProcessMessage - 96 total (0 active), Execution time: mean = 794.794 us, total = 76.300 ms, Queueing time: mean = 19.696 us, max = 164.240 us, min = 2.397 us, total = 1.891 ms [state-dump] - 34 total (0 active), Execution time: mean = 889.353 ns, total = 30.238 us, Queueing time: mean = 73.833 us, max = 186.936 us, min = 20.527 us, total = 2.510 ms [state-dump] RaySyncer.BroadcastMessage - 34 total (0 active), Execution time: mean = 202.349 us, total = 6.880 ms, Queueing time: mean = 632.971 ns, max = 924.000 ns, min = 91.000 ns, total = 21.521 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 30 total (0 active), Execution time: mean = 556.684 us, total = 16.701 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 30 total (0 active), Execution time: mean = 105.102 us, total = 3.153 ms, Queueing time: mean = 87.315 us, max = 165.892 us, min = 19.400 us, total = 2.619 ms [state-dump] WorkerPool.PopWorkerCallback - 30 total (0 active), Execution time: mean = 39.558 us, total = 1.187 ms, Queueing time: mean = 224.723 us, max = 539.776 us, min = 15.433 us, total = 6.742 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 30 total (0 active), Execution time: mean = 88.582 us, total = 2.657 ms, Queueing time: mean = 226.281 us, max = 674.029 us, min = 19.147 us, total = 6.788 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 30 total (0 active), Execution time: mean = 959.639 us, total = 28.789 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 22 total (1 active, 1 running), Execution time: mean = 2.621 ms, total = 57.654 ms, Queueing time: mean = 62.941 us, max = 150.858 us, min = 13.745 us, total = 1.385 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.660 s, total = 1198.640 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 377.345 us, total = 1.132 ms, Queueing time: mean = 149.043 us, max = 238.879 us, min = 70.382 us, total = 447.128 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.612 us, total = 9.223 us, Queueing time: mean = 43.822 us, max = 87.645 us, min = 87.645 us, total = 87.645 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 22:08:50,439 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:08:50,524 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 120885 total (35 active) [state-dump] Queueing time: mean = 65.097 ms, max = 470.895 s, min = -0.000 s, total = 7869.293 s [state-dump] Execution time: mean = 10.091 ms, total = 1219.910 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 28959 total (0 active), Execution time: mean = 36.665 us, total = 1.062 s, Queueing time: mean = 102.912 us, max = 3.225 ms, min = 2.778 us, total = 2.980 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 28959 total (0 active), Execution time: mean = 518.192 us, total = 15.006 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 13787 total (1 active), Execution time: mean = 3.162 us, total = 43.589 ms, Queueing time: mean = 96.280 us, max = 4.613 ms, min = 3.386 us, total = 1.327 s [state-dump] RaySyncer.OnDemandBroadcasting - 13787 total (1 active), Execution time: mean = 11.495 us, total = 158.476 ms, Queueing time: mean = 88.991 us, max = 4.610 ms, min = 7.347 us, total = 1.227 s [state-dump] ObjectManager.UpdateAvailableMemory - 13786 total (0 active), Execution time: mean = 5.816 us, total = 80.182 ms, Queueing time: mean = 102.146 us, max = 730.033 us, min = 2.098 us, total = 1.408 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 6898 total (1 active), Execution time: mean = 19.015 us, total = 131.164 ms, Queueing time: mean = 76.326 us, max = 13.722 ms, min = 9.036 us, total = 526.498 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 5509 total (1 active), Execution time: mean = 454.252 us, total = 2.502 s, Queueing time: mean = 74.221 us, max = 978.705 us, min = -0.000 s, total = 408.883 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1380 total (1 active), Execution time: mean = 15.493 us, total = 21.380 ms, Queueing time: mean = 63.242 us, max = 2.582 ms, min = 7.069 us, total = 87.274 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1380 total (1 active), Execution time: mean = 2.893 us, total = 3.993 ms, Queueing time: mean = 182.345 us, max = 2.379 ms, min = 4.508 us, total = 251.636 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1380 total (1 active), Execution time: mean = 9.031 us, total = 12.463 ms, Queueing time: mean = 178.334 us, max = 2.380 ms, min = 160.000 ns, total = 246.101 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1379 total (0 active), Execution time: mean = 109.745 us, total = 151.339 ms, Queueing time: mean = 108.483 us, max = 1.188 ms, min = 13.994 us, total = 149.597 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1379 total (0 active), Execution time: mean = 649.702 us, total = 895.938 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 461 total (1 active), Execution time: mean = 9.008 us, total = 4.153 ms, Queueing time: mean = 69.161 us, max = 363.446 us, min = 9.881 us, total = 31.883 ms [state-dump] NodeManager.GcsCheckAlive - 276 total (1 active), Execution time: mean = 309.366 us, total = 85.385 ms, Queueing time: mean = 612.418 us, max = 2.263 ms, min = 12.935 us, total = 169.027 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 276 total (0 active), Execution time: mean = 53.341 us, total = 14.722 ms, Queueing time: mean = 100.480 us, max = 307.469 us, min = 11.913 us, total = 27.733 ms [state-dump] NodeManager.deadline_timer.record_metrics - 276 total (1 active), Execution time: mean = 553.102 us, total = 152.656 ms, Queueing time: mean = 370.068 us, max = 1.812 ms, min = 10.977 us, total = 102.139 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 276 total (0 active), Execution time: mean = 1.549 ms, total = 427.506 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 138 total (1 active), Execution time: mean = 1.778 ms, total = 245.391 ms, Queueing time: mean = 66.490 us, max = 148.355 us, min = 16.678 us, total = 9.176 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 117 total (21 active), Execution time: mean = 7.058 us, total = 825.818 us, Queueing time: mean = 67.182 s, max = 470.895 s, min = 23.644 us, total = 7860.242 s [state-dump] ClientConnection.async_read.ProcessMessage - 96 total (0 active), Execution time: mean = 794.794 us, total = 76.300 ms, Queueing time: mean = 19.696 us, max = 164.240 us, min = 2.397 us, total = 1.891 ms [state-dump] - 34 total (0 active), Execution time: mean = 889.353 ns, total = 30.238 us, Queueing time: mean = 73.833 us, max = 186.936 us, min = 20.527 us, total = 2.510 ms [state-dump] RaySyncer.BroadcastMessage - 34 total (0 active), Execution time: mean = 202.349 us, total = 6.880 ms, Queueing time: mean = 632.971 ns, max = 924.000 ns, min = 91.000 ns, total = 21.521 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 30 total (0 active), Execution time: mean = 556.684 us, total = 16.701 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 30 total (0 active), Execution time: mean = 105.102 us, total = 3.153 ms, Queueing time: mean = 87.315 us, max = 165.892 us, min = 19.400 us, total = 2.619 ms [state-dump] WorkerPool.PopWorkerCallback - 30 total (0 active), Execution time: mean = 39.558 us, total = 1.187 ms, Queueing time: mean = 224.723 us, max = 539.776 us, min = 15.433 us, total = 6.742 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 30 total (0 active), Execution time: mean = 88.582 us, total = 2.657 ms, Queueing time: mean = 226.281 us, max = 674.029 us, min = 19.147 us, total = 6.788 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 30 total (0 active), Execution time: mean = 959.639 us, total = 28.789 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 23 total (1 active, 1 running), Execution time: mean = 2.608 ms, total = 59.975 ms, Queueing time: mean = 63.452 us, max = 150.858 us, min = 13.745 us, total = 1.459 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.660 s, total = 1198.640 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 377.345 us, total = 1.132 ms, Queueing time: mean = 149.043 us, max = 238.879 us, min = 70.382 us, total = 447.128 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.612 us, total = 9.223 us, Queueing time: mean = 43.822 us, max = 87.645 us, min = 87.645 us, total = 87.645 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:09:50,439 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:09:50,526 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 126120 total (35 active) [state-dump] Queueing time: mean = 62.398 ms, max = 470.895 s, min = -0.000 s, total = 7869.694 s [state-dump] Execution time: mean = 9.680 ms, total = 1220.876 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 30219 total (0 active), Execution time: mean = 36.725 us, total = 1.110 s, Queueing time: mean = 103.228 us, max = 3.225 ms, min = 2.778 us, total = 3.119 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 30219 total (0 active), Execution time: mean = 519.871 us, total = 15.710 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 14387 total (1 active), Execution time: mean = 3.154 us, total = 45.378 ms, Queueing time: mean = 96.206 us, max = 4.613 ms, min = 3.386 us, total = 1.384 s [state-dump] RaySyncer.OnDemandBroadcasting - 14387 total (1 active), Execution time: mean = 11.418 us, total = 164.277 ms, Queueing time: mean = 88.982 us, max = 4.610 ms, min = 7.347 us, total = 1.280 s [state-dump] ObjectManager.UpdateAvailableMemory - 14386 total (0 active), Execution time: mean = 5.809 us, total = 83.564 ms, Queueing time: mean = 102.514 us, max = 730.033 us, min = 2.098 us, total = 1.475 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 7198 total (1 active), Execution time: mean = 18.942 us, total = 136.341 ms, Queueing time: mean = 76.418 us, max = 13.722 ms, min = 9.036 us, total = 550.057 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 5749 total (1 active), Execution time: mean = 453.589 us, total = 2.608 s, Queueing time: mean = 74.193 us, max = 978.705 us, min = -0.000 s, total = 426.536 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1440 total (1 active), Execution time: mean = 15.412 us, total = 22.194 ms, Queueing time: mean = 63.249 us, max = 2.582 ms, min = 7.069 us, total = 91.079 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1440 total (1 active), Execution time: mean = 2.884 us, total = 4.153 ms, Queueing time: mean = 182.052 us, max = 2.379 ms, min = 4.508 us, total = 262.155 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1440 total (1 active), Execution time: mean = 9.010 us, total = 12.974 ms, Queueing time: mean = 178.046 us, max = 2.380 ms, min = 160.000 ns, total = 256.386 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1439 total (0 active), Execution time: mean = 109.172 us, total = 157.099 ms, Queueing time: mean = 108.507 us, max = 1.188 ms, min = 13.994 us, total = 156.141 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1439 total (0 active), Execution time: mean = 649.600 us, total = 934.774 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 481 total (1 active), Execution time: mean = 9.018 us, total = 4.338 ms, Queueing time: mean = 69.230 us, max = 363.446 us, min = 9.881 us, total = 33.300 ms [state-dump] NodeManager.GcsCheckAlive - 288 total (1 active), Execution time: mean = 312.050 us, total = 89.870 ms, Queueing time: mean = 608.582 us, max = 2.263 ms, min = 12.935 us, total = 175.272 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 288 total (0 active), Execution time: mean = 53.276 us, total = 15.343 ms, Queueing time: mean = 100.364 us, max = 307.469 us, min = 11.913 us, total = 28.905 ms [state-dump] NodeManager.deadline_timer.record_metrics - 288 total (1 active), Execution time: mean = 554.474 us, total = 159.689 ms, Queueing time: mean = 367.762 us, max = 1.812 ms, min = 10.977 us, total = 105.915 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 288 total (0 active), Execution time: mean = 1.556 ms, total = 448.237 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 144 total (1 active), Execution time: mean = 1.778 ms, total = 256.012 ms, Queueing time: mean = 66.998 us, max = 183.426 us, min = 16.285 us, total = 9.648 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 117 total (21 active), Execution time: mean = 7.058 us, total = 825.818 us, Queueing time: mean = 67.182 s, max = 470.895 s, min = 23.644 us, total = 7860.242 s [state-dump] ClientConnection.async_read.ProcessMessage - 96 total (0 active), Execution time: mean = 794.794 us, total = 76.300 ms, Queueing time: mean = 19.696 us, max = 164.240 us, min = 2.397 us, total = 1.891 ms [state-dump] - 34 total (0 active), Execution time: mean = 889.353 ns, total = 30.238 us, Queueing time: mean = 73.833 us, max = 186.936 us, min = 20.527 us, total = 2.510 ms [state-dump] RaySyncer.BroadcastMessage - 34 total (0 active), Execution time: mean = 202.349 us, total = 6.880 ms, Queueing time: mean = 632.971 ns, max = 924.000 ns, min = 91.000 ns, total = 21.521 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 30 total (0 active), Execution time: mean = 556.684 us, total = 16.701 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 30 total (0 active), Execution time: mean = 105.102 us, total = 3.153 ms, Queueing time: mean = 87.315 us, max = 165.892 us, min = 19.400 us, total = 2.619 ms [state-dump] WorkerPool.PopWorkerCallback - 30 total (0 active), Execution time: mean = 39.558 us, total = 1.187 ms, Queueing time: mean = 224.723 us, max = 539.776 us, min = 15.433 us, total = 6.742 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 30 total (0 active), Execution time: mean = 88.582 us, total = 2.657 ms, Queueing time: mean = 226.281 us, max = 674.029 us, min = 19.147 us, total = 6.788 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 30 total (0 active), Execution time: mean = 959.639 us, total = 28.789 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 24 total (1 active, 1 running), Execution time: mean = 2.624 ms, total = 62.980 ms, Queueing time: mean = 63.635 us, max = 150.858 us, min = 13.745 us, total = 1.527 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.660 s, total = 1198.640 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 377.345 us, total = 1.132 ms, Queueing time: mean = 149.043 us, max = 238.879 us, min = 70.382 us, total = 447.128 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.612 us, total = 9.223 us, Queueing time: mean = 43.822 us, max = 87.645 us, min = 87.645 us, total = 87.645 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:10:50,439 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:10:50,530 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 131350 total (35 active) [state-dump] Queueing time: mean = 59.917 ms, max = 470.895 s, min = -0.000 s, total = 7870.122 s [state-dump] Execution time: mean = 9.302 ms, total = 1221.864 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 31479 total (0 active), Execution time: mean = 36.898 us, total = 1.162 s, Queueing time: mean = 103.731 us, max = 3.225 ms, min = 2.778 us, total = 3.265 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 31479 total (0 active), Execution time: mean = 521.746 us, total = 16.424 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 14986 total (1 active), Execution time: mean = 3.160 us, total = 47.359 ms, Queueing time: mean = 96.512 us, max = 4.613 ms, min = 3.386 us, total = 1.446 s [state-dump] RaySyncer.OnDemandBroadcasting - 14986 total (1 active), Execution time: mean = 11.435 us, total = 171.370 ms, Queueing time: mean = 89.278 us, max = 4.610 ms, min = 7.347 us, total = 1.338 s [state-dump] ObjectManager.UpdateAvailableMemory - 14985 total (0 active), Execution time: mean = 5.836 us, total = 87.456 ms, Queueing time: mean = 103.203 us, max = 730.033 us, min = 2.098 us, total = 1.546 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 7497 total (1 active), Execution time: mean = 18.948 us, total = 142.053 ms, Queueing time: mean = 76.363 us, max = 13.722 ms, min = 9.036 us, total = 572.495 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 5988 total (1 active), Execution time: mean = 454.047 us, total = 2.719 s, Queueing time: mean = 74.358 us, max = 978.705 us, min = -0.000 s, total = 445.255 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1500 total (1 active), Execution time: mean = 15.413 us, total = 23.120 ms, Queueing time: mean = 63.641 us, max = 2.582 ms, min = 7.069 us, total = 95.462 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1500 total (1 active), Execution time: mean = 2.887 us, total = 4.330 ms, Queueing time: mean = 182.297 us, max = 2.379 ms, min = 4.508 us, total = 273.445 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1500 total (1 active), Execution time: mean = 9.021 us, total = 13.531 ms, Queueing time: mean = 178.279 us, max = 2.380 ms, min = 160.000 ns, total = 267.419 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1499 total (0 active), Execution time: mean = 108.948 us, total = 163.313 ms, Queueing time: mean = 108.956 us, max = 1.188 ms, min = 13.994 us, total = 163.325 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1499 total (0 active), Execution time: mean = 649.616 us, total = 973.774 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 501 total (1 active), Execution time: mean = 9.093 us, total = 4.555 ms, Queueing time: mean = 69.509 us, max = 363.446 us, min = 9.881 us, total = 34.824 ms [state-dump] NodeManager.GcsCheckAlive - 300 total (1 active), Execution time: mean = 313.025 us, total = 93.908 ms, Queueing time: mean = 608.553 us, max = 2.263 ms, min = 12.935 us, total = 182.566 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 300 total (0 active), Execution time: mean = 53.488 us, total = 16.046 ms, Queueing time: mean = 100.947 us, max = 307.469 us, min = 11.913 us, total = 30.284 ms [state-dump] NodeManager.deadline_timer.record_metrics - 300 total (1 active), Execution time: mean = 554.470 us, total = 166.341 ms, Queueing time: mean = 368.246 us, max = 1.812 ms, min = 10.977 us, total = 110.474 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 300 total (0 active), Execution time: mean = 1.563 ms, total = 468.962 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 150 total (1 active), Execution time: mean = 1.778 ms, total = 266.677 ms, Queueing time: mean = 67.632 us, max = 183.426 us, min = 16.285 us, total = 10.145 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 117 total (21 active), Execution time: mean = 7.058 us, total = 825.818 us, Queueing time: mean = 67.182 s, max = 470.895 s, min = 23.644 us, total = 7860.242 s [state-dump] ClientConnection.async_read.ProcessMessage - 96 total (0 active), Execution time: mean = 794.794 us, total = 76.300 ms, Queueing time: mean = 19.696 us, max = 164.240 us, min = 2.397 us, total = 1.891 ms [state-dump] - 34 total (0 active), Execution time: mean = 889.353 ns, total = 30.238 us, Queueing time: mean = 73.833 us, max = 186.936 us, min = 20.527 us, total = 2.510 ms [state-dump] RaySyncer.BroadcastMessage - 34 total (0 active), Execution time: mean = 202.349 us, total = 6.880 ms, Queueing time: mean = 632.971 ns, max = 924.000 ns, min = 91.000 ns, total = 21.521 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 30 total (0 active), Execution time: mean = 556.684 us, total = 16.701 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 30 total (0 active), Execution time: mean = 105.102 us, total = 3.153 ms, Queueing time: mean = 87.315 us, max = 165.892 us, min = 19.400 us, total = 2.619 ms [state-dump] WorkerPool.PopWorkerCallback - 30 total (0 active), Execution time: mean = 39.558 us, total = 1.187 ms, Queueing time: mean = 224.723 us, max = 539.776 us, min = 15.433 us, total = 6.742 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 30 total (0 active), Execution time: mean = 88.582 us, total = 2.657 ms, Queueing time: mean = 226.281 us, max = 674.029 us, min = 19.147 us, total = 6.788 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 30 total (0 active), Execution time: mean = 959.639 us, total = 28.789 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 25 total (1 active, 1 running), Execution time: mean = 2.619 ms, total = 65.469 ms, Queueing time: mean = 63.271 us, max = 150.858 us, min = 13.745 us, total = 1.582 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.660 s, total = 1198.640 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 377.345 us, total = 1.132 ms, Queueing time: mean = 149.043 us, max = 238.879 us, min = 70.382 us, total = 447.128 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.612 us, total = 9.223 us, Queueing time: mean = 43.822 us, max = 87.645 us, min = 87.645 us, total = 87.645 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 22:11:50,440 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:11:50,533 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 136582 total (35 active) [state-dump] Queueing time: mean = 57.625 ms, max = 470.895 s, min = -0.000 s, total = 7870.572 s [state-dump] Execution time: mean = 8.952 ms, total = 1222.737 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 32739 total (0 active), Execution time: mean = 36.817 us, total = 1.205 s, Queueing time: mean = 103.304 us, max = 3.225 ms, min = 2.778 us, total = 3.382 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 32739 total (0 active), Execution time: mean = 520.580 us, total = 17.043 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 15585 total (1 active), Execution time: mean = 3.156 us, total = 49.184 ms, Queueing time: mean = 99.367 us, max = 51.449 ms, min = 3.386 us, total = 1.549 s [state-dump] RaySyncer.OnDemandBroadcasting - 15585 total (1 active), Execution time: mean = 11.377 us, total = 177.306 ms, Queueing time: mean = 92.183 us, max = 51.440 ms, min = 7.347 us, total = 1.437 s [state-dump] ObjectManager.UpdateAvailableMemory - 15584 total (0 active), Execution time: mean = 5.800 us, total = 90.385 ms, Queueing time: mean = 102.352 us, max = 730.033 us, min = 2.098 us, total = 1.595 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 7797 total (1 active), Execution time: mean = 18.903 us, total = 147.388 ms, Queueing time: mean = 76.144 us, max = 13.722 ms, min = 9.036 us, total = 593.698 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 6228 total (1 active), Execution time: mean = 453.662 us, total = 2.825 s, Queueing time: mean = 74.090 us, max = 978.705 us, min = -0.000 s, total = 461.433 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1560 total (1 active), Execution time: mean = 15.333 us, total = 23.919 ms, Queueing time: mean = 63.341 us, max = 2.582 ms, min = 7.069 us, total = 98.812 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1560 total (1 active), Execution time: mean = 2.917 us, total = 4.550 ms, Queueing time: mean = 182.439 us, max = 2.379 ms, min = 4.508 us, total = 284.605 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1560 total (1 active), Execution time: mean = 9.033 us, total = 14.091 ms, Queueing time: mean = 178.453 us, max = 2.380 ms, min = 160.000 ns, total = 278.386 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1559 total (0 active), Execution time: mean = 108.373 us, total = 168.954 ms, Queueing time: mean = 108.315 us, max = 1.188 ms, min = 13.934 us, total = 168.863 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1559 total (0 active), Execution time: mean = 647.093 us, total = 1.009 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 521 total (1 active), Execution time: mean = 9.068 us, total = 4.724 ms, Queueing time: mean = 69.024 us, max = 363.446 us, min = 9.881 us, total = 35.961 ms [state-dump] NodeManager.GcsCheckAlive - 312 total (1 active), Execution time: mean = 313.719 us, total = 97.880 ms, Queueing time: mean = 609.500 us, max = 2.263 ms, min = 12.935 us, total = 190.164 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 312 total (0 active), Execution time: mean = 53.420 us, total = 16.667 ms, Queueing time: mean = 101.385 us, max = 307.469 us, min = 11.913 us, total = 31.632 ms [state-dump] NodeManager.deadline_timer.record_metrics - 312 total (1 active), Execution time: mean = 555.708 us, total = 173.381 ms, Queueing time: mean = 368.503 us, max = 1.812 ms, min = 10.977 us, total = 114.973 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 312 total (0 active), Execution time: mean = 1.563 ms, total = 487.772 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 156 total (1 active), Execution time: mean = 1.781 ms, total = 277.787 ms, Queueing time: mean = 67.992 us, max = 183.426 us, min = 16.285 us, total = 10.607 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 117 total (21 active), Execution time: mean = 7.058 us, total = 825.818 us, Queueing time: mean = 67.182 s, max = 470.895 s, min = 23.644 us, total = 7860.242 s [state-dump] ClientConnection.async_read.ProcessMessage - 96 total (0 active), Execution time: mean = 794.794 us, total = 76.300 ms, Queueing time: mean = 19.696 us, max = 164.240 us, min = 2.397 us, total = 1.891 ms [state-dump] - 34 total (0 active), Execution time: mean = 889.353 ns, total = 30.238 us, Queueing time: mean = 73.833 us, max = 186.936 us, min = 20.527 us, total = 2.510 ms [state-dump] RaySyncer.BroadcastMessage - 34 total (0 active), Execution time: mean = 202.349 us, total = 6.880 ms, Queueing time: mean = 632.971 ns, max = 924.000 ns, min = 91.000 ns, total = 21.521 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 30 total (0 active), Execution time: mean = 556.684 us, total = 16.701 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 30 total (0 active), Execution time: mean = 105.102 us, total = 3.153 ms, Queueing time: mean = 87.315 us, max = 165.892 us, min = 19.400 us, total = 2.619 ms [state-dump] WorkerPool.PopWorkerCallback - 30 total (0 active), Execution time: mean = 39.558 us, total = 1.187 ms, Queueing time: mean = 224.723 us, max = 539.776 us, min = 15.433 us, total = 6.742 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 30 total (0 active), Execution time: mean = 88.582 us, total = 2.657 ms, Queueing time: mean = 226.281 us, max = 674.029 us, min = 19.147 us, total = 6.788 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 30 total (0 active), Execution time: mean = 959.639 us, total = 28.789 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 26 total (1 active, 1 running), Execution time: mean = 2.634 ms, total = 68.492 ms, Queueing time: mean = 63.862 us, max = 150.858 us, min = 13.745 us, total = 1.660 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.660 s, total = 1198.640 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 377.345 us, total = 1.132 ms, Queueing time: mean = 149.043 us, max = 238.879 us, min = 70.382 us, total = 447.128 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.612 us, total = 9.223 us, Queueing time: mean = 43.822 us, max = 87.645 us, min = 87.645 us, total = 87.645 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:12:50,440 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:12:50,536 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{node:192.168.0.2: 10000, GPU: 20000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, CPU: 200000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:192.168.0.2: 10000, CPU: 200000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 141847 total (35 active) [state-dump] Queueing time: mean = 80.642 ms, max = 902.083 s, min = -0.000 s, total = 11438.874 s [state-dump] Execution time: mean = 8.626 ms, total = 1223.615 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 33999 total (0 active), Execution time: mean = 36.719 us, total = 1.248 s, Queueing time: mean = 103.322 us, max = 3.225 ms, min = 1.438 us, total = 3.513 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 33999 total (0 active), Execution time: mean = 519.582 us, total = 17.665 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 16185 total (1 active), Execution time: mean = 3.152 us, total = 51.018 ms, Queueing time: mean = 98.929 us, max = 51.449 ms, min = 3.386 us, total = 1.601 s [state-dump] RaySyncer.OnDemandBroadcasting - 16185 total (1 active), Execution time: mean = 11.374 us, total = 184.095 ms, Queueing time: mean = 91.744 us, max = 51.440 ms, min = 7.347 us, total = 1.485 s [state-dump] ObjectManager.UpdateAvailableMemory - 16184 total (0 active), Execution time: mean = 5.769 us, total = 93.370 ms, Queueing time: mean = 101.483 us, max = 730.033 us, min = 2.098 us, total = 1.642 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 8097 total (1 active), Execution time: mean = 18.819 us, total = 152.379 ms, Queueing time: mean = 75.734 us, max = 13.722 ms, min = 9.036 us, total = 613.220 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 6467 total (1 active), Execution time: mean = 453.351 us, total = 2.932 s, Queueing time: mean = 73.954 us, max = 978.705 us, min = -0.000 s, total = 478.261 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1620 total (1 active), Execution time: mean = 15.277 us, total = 24.749 ms, Queueing time: mean = 63.458 us, max = 2.582 ms, min = 7.069 us, total = 102.803 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1620 total (1 active), Execution time: mean = 2.922 us, total = 4.734 ms, Queueing time: mean = 181.951 us, max = 2.379 ms, min = 4.508 us, total = 294.761 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1620 total (1 active), Execution time: mean = 9.072 us, total = 14.697 ms, Queueing time: mean = 177.945 us, max = 2.380 ms, min = 160.000 ns, total = 288.270 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1619 total (0 active), Execution time: mean = 107.599 us, total = 174.203 ms, Queueing time: mean = 108.004 us, max = 1.188 ms, min = 5.683 us, total = 174.859 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1619 total (0 active), Execution time: mean = 643.484 us, total = 1.042 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 541 total (1 active), Execution time: mean = 9.026 us, total = 4.883 ms, Queueing time: mean = 68.832 us, max = 363.446 us, min = 9.881 us, total = 37.238 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 324 total (0 active), Execution time: mean = 53.528 us, total = 17.343 ms, Queueing time: mean = 101.565 us, max = 307.469 us, min = 11.913 us, total = 32.907 ms [state-dump] NodeManager.GcsCheckAlive - 324 total (1 active), Execution time: mean = 313.212 us, total = 101.481 ms, Queueing time: mean = 607.866 us, max = 2.263 ms, min = 12.935 us, total = 196.949 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 324 total (0 active), Execution time: mean = 1.558 ms, total = 504.713 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 324 total (1 active), Execution time: mean = 553.944 us, total = 179.478 ms, Queueing time: mean = 368.530 us, max = 1.812 ms, min = 10.977 us, total = 119.404 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 162 total (1 active), Execution time: mean = 1.778 ms, total = 287.997 ms, Queueing time: mean = 68.775 us, max = 183.426 us, min = 16.285 us, total = 11.141 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 122 total (21 active), Execution time: mean = 7.365 us, total = 898.545 us, Queueing time: mean = 93.674 s, max = 902.083 s, min = 23.644 us, total = 11428.182 s [state-dump] ClientConnection.async_read.ProcessMessage - 101 total (0 active), Execution time: mean = 760.107 us, total = 76.771 ms, Queueing time: mean = 19.894 us, max = 164.240 us, min = 2.397 us, total = 2.009 ms [state-dump] RaySyncer.BroadcastMessage - 36 total (0 active), Execution time: mean = 202.082 us, total = 7.275 ms, Queueing time: mean = 635.167 ns, max = 924.000 ns, min = 91.000 ns, total = 22.866 us [state-dump] - 36 total (0 active), Execution time: mean = 890.333 ns, total = 32.052 us, Queueing time: mean = 72.072 us, max = 186.936 us, min = 20.527 us, total = 2.595 ms [state-dump] WorkerPool.PopWorkerCallback - 33 total (0 active), Execution time: mean = 39.520 us, total = 1.304 ms, Queueing time: mean = 220.100 us, max = 539.776 us, min = 15.433 us, total = 7.263 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 33 total (0 active), Execution time: mean = 101.288 us, total = 3.342 ms, Queueing time: mean = 82.491 us, max = 165.892 us, min = 19.400 us, total = 2.722 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 33 total (0 active), Execution time: mean = 550.187 us, total = 18.156 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 33 total (0 active), Execution time: mean = 93.106 us, total = 3.073 ms, Queueing time: mean = 231.157 us, max = 674.029 us, min = 19.147 us, total = 7.628 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 33 total (0 active), Execution time: mean = 1.005 ms, total = 33.175 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 27 total (1 active, 1 running), Execution time: mean = 2.645 ms, total = 71.408 ms, Queueing time: mean = 63.529 us, max = 150.858 us, min = 13.745 us, total = 1.715 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.660 s, total = 1198.640 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 377.345 us, total = 1.132 ms, Queueing time: mean = 149.043 us, max = 238.879 us, min = 70.382 us, total = 447.128 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.612 us, total = 9.223 us, Queueing time: mean = 43.822 us, max = 87.645 us, min = 87.645 us, total = 87.645 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 22:13:50,440 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:13:50,539 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{node:192.168.0.2: 10000, GPU: 20000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, CPU: 200000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:192.168.0.2: 10000, CPU: 200000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 147079 total (35 active) [state-dump] Queueing time: mean = 77.776 ms, max = 902.083 s, min = -0.000 s, total = 11439.203 s [state-dump] Execution time: mean = 8.325 ms, total = 1224.462 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 35259 total (0 active), Execution time: mean = 36.515 us, total = 1.287 s, Queueing time: mean = 102.638 us, max = 3.225 ms, min = 1.438 us, total = 3.619 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 35259 total (0 active), Execution time: mean = 518.354 us, total = 18.277 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 16784 total (1 active), Execution time: mean = 3.141 us, total = 52.712 ms, Queueing time: mean = 98.257 us, max = 51.449 ms, min = 3.386 us, total = 1.649 s [state-dump] RaySyncer.OnDemandBroadcasting - 16784 total (1 active), Execution time: mean = 11.291 us, total = 189.505 ms, Queueing time: mean = 91.140 us, max = 51.440 ms, min = 7.347 us, total = 1.530 s [state-dump] ObjectManager.UpdateAvailableMemory - 16783 total (0 active), Execution time: mean = 5.728 us, total = 96.130 ms, Queueing time: mean = 100.599 us, max = 730.033 us, min = 2.098 us, total = 1.688 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 8397 total (1 active), Execution time: mean = 18.702 us, total = 157.039 ms, Queueing time: mean = 75.428 us, max = 13.722 ms, min = 9.036 us, total = 633.368 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 6707 total (1 active), Execution time: mean = 452.274 us, total = 3.033 s, Queueing time: mean = 73.501 us, max = 978.705 us, min = -0.000 s, total = 492.971 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1680 total (1 active), Execution time: mean = 15.211 us, total = 25.554 ms, Queueing time: mean = 63.227 us, max = 2.582 ms, min = 6.718 us, total = 106.222 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1680 total (1 active), Execution time: mean = 2.914 us, total = 4.896 ms, Queueing time: mean = 182.650 us, max = 2.379 ms, min = 4.508 us, total = 306.852 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1680 total (1 active), Execution time: mean = 9.040 us, total = 15.187 ms, Queueing time: mean = 178.660 us, max = 2.380 ms, min = 160.000 ns, total = 300.149 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1679 total (0 active), Execution time: mean = 106.671 us, total = 179.100 ms, Queueing time: mean = 107.195 us, max = 1.188 ms, min = 5.683 us, total = 179.981 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1679 total (0 active), Execution time: mean = 639.953 us, total = 1.074 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 561 total (1 active), Execution time: mean = 9.002 us, total = 5.050 ms, Queueing time: mean = 68.414 us, max = 363.446 us, min = 9.881 us, total = 38.381 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 336 total (0 active), Execution time: mean = 53.108 us, total = 17.844 ms, Queueing time: mean = 100.382 us, max = 307.469 us, min = 11.913 us, total = 33.728 ms [state-dump] NodeManager.GcsCheckAlive - 336 total (1 active), Execution time: mean = 311.192 us, total = 104.561 ms, Queueing time: mean = 613.339 us, max = 2.263 ms, min = 12.935 us, total = 206.082 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 336 total (0 active), Execution time: mean = 1.554 ms, total = 522.051 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 336 total (1 active), Execution time: mean = 552.496 us, total = 185.639 ms, Queueing time: mean = 373.212 us, max = 1.812 ms, min = 10.977 us, total = 125.399 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 168 total (1 active), Execution time: mean = 1.783 ms, total = 299.540 ms, Queueing time: mean = 68.737 us, max = 183.426 us, min = 16.285 us, total = 11.548 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 122 total (21 active), Execution time: mean = 7.365 us, total = 898.545 us, Queueing time: mean = 93.674 s, max = 902.083 s, min = 23.644 us, total = 11428.182 s [state-dump] ClientConnection.async_read.ProcessMessage - 101 total (0 active), Execution time: mean = 760.107 us, total = 76.771 ms, Queueing time: mean = 19.894 us, max = 164.240 us, min = 2.397 us, total = 2.009 ms [state-dump] RaySyncer.BroadcastMessage - 36 total (0 active), Execution time: mean = 202.082 us, total = 7.275 ms, Queueing time: mean = 635.167 ns, max = 924.000 ns, min = 91.000 ns, total = 22.866 us [state-dump] - 36 total (0 active), Execution time: mean = 890.333 ns, total = 32.052 us, Queueing time: mean = 72.072 us, max = 186.936 us, min = 20.527 us, total = 2.595 ms [state-dump] WorkerPool.PopWorkerCallback - 33 total (0 active), Execution time: mean = 39.520 us, total = 1.304 ms, Queueing time: mean = 220.100 us, max = 539.776 us, min = 15.433 us, total = 7.263 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 33 total (0 active), Execution time: mean = 101.288 us, total = 3.342 ms, Queueing time: mean = 82.491 us, max = 165.892 us, min = 19.400 us, total = 2.722 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 33 total (0 active), Execution time: mean = 550.187 us, total = 18.156 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 33 total (0 active), Execution time: mean = 93.106 us, total = 3.073 ms, Queueing time: mean = 231.157 us, max = 674.029 us, min = 19.147 us, total = 7.628 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 33 total (0 active), Execution time: mean = 1.005 ms, total = 33.175 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 28 total (1 active, 1 running), Execution time: mean = 2.655 ms, total = 74.348 ms, Queueing time: mean = 62.159 us, max = 150.858 us, min = 13.745 us, total = 1.740 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.660 s, total = 1198.640 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 377.345 us, total = 1.132 ms, Queueing time: mean = 149.043 us, max = 238.879 us, min = 70.382 us, total = 447.128 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.612 us, total = 9.223 us, Queueing time: mean = 43.822 us, max = 87.645 us, min = 87.645 us, total = 87.645 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 22:14:50,441 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:14:50,541 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{node:192.168.0.2: 10000, GPU: 20000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, CPU: 200000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:192.168.0.2: 10000, CPU: 200000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 152313 total (35 active) [state-dump] Queueing time: mean = 75.106 ms, max = 902.083 s, min = -0.000 s, total = 11439.595 s [state-dump] Execution time: mean = 8.045 ms, total = 1225.379 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 36519 total (0 active), Execution time: mean = 36.505 us, total = 1.333 s, Queueing time: mean = 102.967 us, max = 3.225 ms, min = 1.438 us, total = 3.760 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 36519 total (0 active), Execution time: mean = 518.634 us, total = 18.940 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 17384 total (1 active), Execution time: mean = 3.140 us, total = 54.588 ms, Queueing time: mean = 98.118 us, max = 51.449 ms, min = 3.386 us, total = 1.706 s [state-dump] RaySyncer.OnDemandBroadcasting - 17384 total (1 active), Execution time: mean = 11.272 us, total = 195.961 ms, Queueing time: mean = 91.016 us, max = 51.440 ms, min = 7.347 us, total = 1.582 s [state-dump] ObjectManager.UpdateAvailableMemory - 17383 total (0 active), Execution time: mean = 5.717 us, total = 99.387 ms, Queueing time: mean = 100.398 us, max = 837.793 us, min = 2.098 us, total = 1.745 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 8697 total (1 active), Execution time: mean = 18.704 us, total = 162.666 ms, Queueing time: mean = 75.396 us, max = 13.722 ms, min = 9.036 us, total = 655.718 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 6946 total (1 active), Execution time: mean = 451.703 us, total = 3.138 s, Queueing time: mean = 73.395 us, max = 978.705 us, min = -0.000 s, total = 509.800 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1740 total (1 active), Execution time: mean = 15.189 us, total = 26.428 ms, Queueing time: mean = 63.119 us, max = 2.582 ms, min = 6.718 us, total = 109.828 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1740 total (1 active), Execution time: mean = 2.926 us, total = 5.091 ms, Queueing time: mean = 182.541 us, max = 2.379 ms, min = 4.508 us, total = 317.622 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1740 total (1 active), Execution time: mean = 9.066 us, total = 15.774 ms, Queueing time: mean = 178.562 us, max = 2.380 ms, min = 160.000 ns, total = 310.698 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1739 total (0 active), Execution time: mean = 106.324 us, total = 184.898 ms, Queueing time: mean = 107.301 us, max = 1.188 ms, min = 5.683 us, total = 186.597 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1739 total (0 active), Execution time: mean = 638.867 us, total = 1.111 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 581 total (1 active), Execution time: mean = 8.981 us, total = 5.218 ms, Queueing time: mean = 68.155 us, max = 363.446 us, min = 9.881 us, total = 39.598 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 348 total (0 active), Execution time: mean = 53.191 us, total = 18.510 ms, Queueing time: mean = 100.765 us, max = 307.469 us, min = 11.913 us, total = 35.066 ms [state-dump] NodeManager.GcsCheckAlive - 348 total (1 active), Execution time: mean = 311.841 us, total = 108.521 ms, Queueing time: mean = 611.871 us, max = 2.263 ms, min = 12.935 us, total = 212.931 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 348 total (0 active), Execution time: mean = 1.552 ms, total = 540.117 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 348 total (1 active), Execution time: mean = 552.669 us, total = 192.329 ms, Queueing time: mean = 371.880 us, max = 1.812 ms, min = 10.977 us, total = 129.414 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 174 total (1 active), Execution time: mean = 1.782 ms, total = 310.050 ms, Queueing time: mean = 68.558 us, max = 183.426 us, min = 16.285 us, total = 11.929 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 122 total (21 active), Execution time: mean = 7.365 us, total = 898.545 us, Queueing time: mean = 93.674 s, max = 902.083 s, min = 23.644 us, total = 11428.182 s [state-dump] ClientConnection.async_read.ProcessMessage - 101 total (0 active), Execution time: mean = 760.107 us, total = 76.771 ms, Queueing time: mean = 19.894 us, max = 164.240 us, min = 2.397 us, total = 2.009 ms [state-dump] RaySyncer.BroadcastMessage - 36 total (0 active), Execution time: mean = 202.082 us, total = 7.275 ms, Queueing time: mean = 635.167 ns, max = 924.000 ns, min = 91.000 ns, total = 22.866 us [state-dump] - 36 total (0 active), Execution time: mean = 890.333 ns, total = 32.052 us, Queueing time: mean = 72.072 us, max = 186.936 us, min = 20.527 us, total = 2.595 ms [state-dump] WorkerPool.PopWorkerCallback - 33 total (0 active), Execution time: mean = 39.520 us, total = 1.304 ms, Queueing time: mean = 220.100 us, max = 539.776 us, min = 15.433 us, total = 7.263 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 33 total (0 active), Execution time: mean = 101.288 us, total = 3.342 ms, Queueing time: mean = 82.491 us, max = 165.892 us, min = 19.400 us, total = 2.722 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 33 total (0 active), Execution time: mean = 550.187 us, total = 18.156 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 33 total (0 active), Execution time: mean = 93.106 us, total = 3.073 ms, Queueing time: mean = 231.157 us, max = 674.029 us, min = 19.147 us, total = 7.628 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 33 total (0 active), Execution time: mean = 1.005 ms, total = 33.175 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 29 total (1 active, 1 running), Execution time: mean = 2.665 ms, total = 77.290 ms, Queueing time: mean = 62.166 us, max = 150.858 us, min = 13.745 us, total = 1.803 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.660 s, total = 1198.640 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 377.345 us, total = 1.132 ms, Queueing time: mean = 149.043 us, max = 238.879 us, min = 70.382 us, total = 447.128 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.612 us, total = 9.223 us, Queueing time: mean = 43.822 us, max = 87.645 us, min = 87.645 us, total = 87.645 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:15:50,441 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:15:50,545 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{node:192.168.0.2: 10000, GPU: 20000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, CPU: 200000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:192.168.0.2: 10000, CPU: 200000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 157548 total (35 active) [state-dump] Queueing time: mean = 72.613 ms, max = 902.083 s, min = -0.000 s, total = 11440.019 s [state-dump] Execution time: mean = 11.592 ms, total = 1826.356 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 37779 total (0 active), Execution time: mean = 36.595 us, total = 1.383 s, Queueing time: mean = 103.353 us, max = 3.225 ms, min = 1.438 us, total = 3.905 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 37779 total (0 active), Execution time: mean = 519.808 us, total = 19.638 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 17983 total (1 active), Execution time: mean = 3.151 us, total = 56.669 ms, Queueing time: mean = 98.366 us, max = 51.449 ms, min = 3.386 us, total = 1.769 s [state-dump] RaySyncer.OnDemandBroadcasting - 17983 total (1 active), Execution time: mean = 11.311 us, total = 203.402 ms, Queueing time: mean = 91.239 us, max = 51.440 ms, min = 7.347 us, total = 1.641 s [state-dump] ObjectManager.UpdateAvailableMemory - 17982 total (0 active), Execution time: mean = 5.752 us, total = 103.441 ms, Queueing time: mean = 100.587 us, max = 837.793 us, min = 2.098 us, total = 1.809 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 8997 total (1 active), Execution time: mean = 18.759 us, total = 168.777 ms, Queueing time: mean = 75.455 us, max = 13.722 ms, min = 9.036 us, total = 678.868 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 7186 total (1 active), Execution time: mean = 452.829 us, total = 3.254 s, Queueing time: mean = 73.720 us, max = 978.705 us, min = -0.000 s, total = 529.750 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1800 total (1 active), Execution time: mean = 2.938 us, total = 5.289 ms, Queueing time: mean = 182.957 us, max = 2.379 ms, min = 4.508 us, total = 329.322 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1800 total (1 active), Execution time: mean = 15.284 us, total = 27.511 ms, Queueing time: mean = 63.725 us, max = 2.582 ms, min = 6.718 us, total = 114.705 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1800 total (1 active), Execution time: mean = 9.156 us, total = 16.481 ms, Queueing time: mean = 178.928 us, max = 2.380 ms, min = 160.000 ns, total = 322.070 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1799 total (0 active), Execution time: mean = 106.291 us, total = 191.218 ms, Queueing time: mean = 107.383 us, max = 1.188 ms, min = 5.071 us, total = 193.182 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1799 total (0 active), Execution time: mean = 638.437 us, total = 1.149 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 601 total (1 active), Execution time: mean = 9.059 us, total = 5.445 ms, Queueing time: mean = 68.748 us, max = 363.446 us, min = 9.881 us, total = 41.317 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 360 total (0 active), Execution time: mean = 53.562 us, total = 19.282 ms, Queueing time: mean = 100.884 us, max = 307.469 us, min = 11.913 us, total = 36.318 ms [state-dump] NodeManager.GcsCheckAlive - 360 total (1 active), Execution time: mean = 313.810 us, total = 112.971 ms, Queueing time: mean = 612.006 us, max = 2.263 ms, min = 12.935 us, total = 220.322 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 360 total (0 active), Execution time: mean = 1.556 ms, total = 560.317 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 360 total (1 active), Execution time: mean = 553.333 us, total = 199.200 ms, Queueing time: mean = 373.405 us, max = 1.812 ms, min = 10.977 us, total = 134.426 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 180 total (1 active), Execution time: mean = 1.785 ms, total = 321.368 ms, Queueing time: mean = 69.173 us, max = 183.426 us, min = 16.285 us, total = 12.451 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 122 total (21 active), Execution time: mean = 7.365 us, total = 898.545 us, Queueing time: mean = 93.674 s, max = 902.083 s, min = 23.644 us, total = 11428.182 s [state-dump] ClientConnection.async_read.ProcessMessage - 101 total (0 active), Execution time: mean = 760.107 us, total = 76.771 ms, Queueing time: mean = 19.894 us, max = 164.240 us, min = 2.397 us, total = 2.009 ms [state-dump] - 36 total (0 active), Execution time: mean = 890.333 ns, total = 32.052 us, Queueing time: mean = 72.072 us, max = 186.936 us, min = 20.527 us, total = 2.595 ms [state-dump] RaySyncer.BroadcastMessage - 36 total (0 active), Execution time: mean = 202.082 us, total = 7.275 ms, Queueing time: mean = 635.167 ns, max = 924.000 ns, min = 91.000 ns, total = 22.866 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 33 total (0 active), Execution time: mean = 550.187 us, total = 18.156 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 33 total (0 active), Execution time: mean = 101.288 us, total = 3.342 ms, Queueing time: mean = 82.491 us, max = 165.892 us, min = 19.400 us, total = 2.722 ms [state-dump] WorkerPool.PopWorkerCallback - 33 total (0 active), Execution time: mean = 39.520 us, total = 1.304 ms, Queueing time: mean = 220.100 us, max = 539.776 us, min = 15.433 us, total = 7.263 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 33 total (0 active), Execution time: mean = 93.106 us, total = 3.073 ms, Queueing time: mean = 231.157 us, max = 674.029 us, min = 19.147 us, total = 7.628 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 33 total (0 active), Execution time: mean = 1.005 ms, total = 33.175 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 30 total (1 active, 1 running), Execution time: mean = 2.674 ms, total = 80.223 ms, Queueing time: mean = 63.019 us, max = 150.858 us, min = 13.745 us, total = 1.891 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.728 s, total = 1798.640 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 392.023 us, total = 1.568 ms, Queueing time: mean = 146.930 us, max = 238.879 us, min = 70.382 us, total = 587.719 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 7.528 us, total = 22.583 us, Queueing time: mean = 52.348 us, max = 87.645 us, min = 69.398 us, total = 157.043 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:16:50,441 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:16:50,548 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{node:192.168.0.2: 10000, GPU: 20000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, CPU: 200000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:192.168.0.2: 10000, CPU: 200000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 162782 total (35 active) [state-dump] Queueing time: mean = 70.281 ms, max = 902.083 s, min = -0.000 s, total = 11440.429 s [state-dump] Execution time: mean = 11.225 ms, total = 1827.294 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 39039 total (0 active), Execution time: mean = 36.595 us, total = 1.429 s, Queueing time: mean = 103.591 us, max = 3.225 ms, min = 1.438 us, total = 4.044 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 39039 total (0 active), Execution time: mean = 520.289 us, total = 20.312 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 18583 total (1 active), Execution time: mean = 3.155 us, total = 58.625 ms, Queueing time: mean = 98.460 us, max = 51.449 ms, min = 3.386 us, total = 1.830 s [state-dump] RaySyncer.OnDemandBroadcasting - 18583 total (1 active), Execution time: mean = 11.320 us, total = 210.358 ms, Queueing time: mean = 91.328 us, max = 51.440 ms, min = 7.347 us, total = 1.697 s [state-dump] ObjectManager.UpdateAvailableMemory - 18582 total (0 active), Execution time: mean = 5.767 us, total = 107.169 ms, Queueing time: mean = 100.543 us, max = 837.793 us, min = 2.098 us, total = 1.868 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 9297 total (1 active), Execution time: mean = 18.846 us, total = 175.211 ms, Queueing time: mean = 75.757 us, max = 13.722 ms, min = 9.036 us, total = 704.311 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 7425 total (1 active), Execution time: mean = 453.206 us, total = 3.365 s, Queueing time: mean = 74.044 us, max = 978.705 us, min = -0.000 s, total = 549.776 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1860 total (1 active), Execution time: mean = 2.939 us, total = 5.466 ms, Queueing time: mean = 183.176 us, max = 2.379 ms, min = 4.508 us, total = 340.707 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1860 total (1 active), Execution time: mean = 15.313 us, total = 28.481 ms, Queueing time: mean = 63.886 us, max = 2.582 ms, min = 6.718 us, total = 118.829 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1860 total (1 active), Execution time: mean = 9.185 us, total = 17.083 ms, Queueing time: mean = 179.134 us, max = 2.380 ms, min = 160.000 ns, total = 333.189 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1859 total (0 active), Execution time: mean = 105.852 us, total = 196.779 ms, Queueing time: mean = 107.185 us, max = 1.188 ms, min = 5.071 us, total = 199.258 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1859 total (0 active), Execution time: mean = 636.884 us, total = 1.184 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 621 total (1 active), Execution time: mean = 9.077 us, total = 5.637 ms, Queueing time: mean = 68.769 us, max = 363.446 us, min = 9.881 us, total = 42.706 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 372 total (0 active), Execution time: mean = 53.549 us, total = 19.920 ms, Queueing time: mean = 101.040 us, max = 307.469 us, min = 11.913 us, total = 37.587 ms [state-dump] NodeManager.GcsCheckAlive - 372 total (1 active), Execution time: mean = 314.146 us, total = 116.862 ms, Queueing time: mean = 613.183 us, max = 2.263 ms, min = 12.935 us, total = 228.104 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 372 total (0 active), Execution time: mean = 1.556 ms, total = 578.974 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 372 total (1 active), Execution time: mean = 553.353 us, total = 205.847 ms, Queueing time: mean = 375.067 us, max = 1.812 ms, min = 10.977 us, total = 139.525 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 186 total (1 active), Execution time: mean = 1.789 ms, total = 332.691 ms, Queueing time: mean = 69.255 us, max = 183.426 us, min = 16.285 us, total = 12.881 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 122 total (21 active), Execution time: mean = 7.365 us, total = 898.545 us, Queueing time: mean = 93.674 s, max = 902.083 s, min = 23.644 us, total = 11428.182 s [state-dump] ClientConnection.async_read.ProcessMessage - 101 total (0 active), Execution time: mean = 760.107 us, total = 76.771 ms, Queueing time: mean = 19.894 us, max = 164.240 us, min = 2.397 us, total = 2.009 ms [state-dump] - 36 total (0 active), Execution time: mean = 890.333 ns, total = 32.052 us, Queueing time: mean = 72.072 us, max = 186.936 us, min = 20.527 us, total = 2.595 ms [state-dump] RaySyncer.BroadcastMessage - 36 total (0 active), Execution time: mean = 202.082 us, total = 7.275 ms, Queueing time: mean = 635.167 ns, max = 924.000 ns, min = 91.000 ns, total = 22.866 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 33 total (0 active), Execution time: mean = 550.187 us, total = 18.156 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 33 total (0 active), Execution time: mean = 101.288 us, total = 3.342 ms, Queueing time: mean = 82.491 us, max = 165.892 us, min = 19.400 us, total = 2.722 ms [state-dump] WorkerPool.PopWorkerCallback - 33 total (0 active), Execution time: mean = 39.520 us, total = 1.304 ms, Queueing time: mean = 220.100 us, max = 539.776 us, min = 15.433 us, total = 7.263 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 33 total (0 active), Execution time: mean = 93.106 us, total = 3.073 ms, Queueing time: mean = 231.157 us, max = 674.029 us, min = 19.147 us, total = 7.628 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 33 total (0 active), Execution time: mean = 1.005 ms, total = 33.175 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 31 total (1 active, 1 running), Execution time: mean = 2.691 ms, total = 83.409 ms, Queueing time: mean = 63.741 us, max = 150.858 us, min = 13.745 us, total = 1.976 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.728 s, total = 1798.640 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 392.023 us, total = 1.568 ms, Queueing time: mean = 146.930 us, max = 238.879 us, min = 70.382 us, total = 587.719 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 7.528 us, total = 22.583 us, Queueing time: mean = 52.348 us, max = 87.645 us, min = 69.398 us, total = 157.043 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:17:50,442 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:17:50,551 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{node:192.168.0.2: 10000, GPU: 20000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, CPU: 200000, memory: 846480855040000, node:__internal_head__: 10000}}, "available": {GPU: 20000, accelerator_type:A40: 10000, node:192.168.0.2: 10000, CPU: 200000, object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 168014 total (35 active) [state-dump] Queueing time: mean = 68.094 ms, max = 902.083 s, min = -0.000 s, total = 11440.823 s [state-dump] Execution time: mean = 10.881 ms, total = 1828.180 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 40299 total (0 active), Execution time: mean = 36.481 us, total = 1.470 s, Queueing time: mean = 103.609 us, max = 3.225 ms, min = 1.438 us, total = 4.175 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 40299 total (0 active), Execution time: mean = 519.910 us, total = 20.952 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 19182 total (1 active), Execution time: mean = 3.150 us, total = 60.427 ms, Queueing time: mean = 98.414 us, max = 51.449 ms, min = 3.386 us, total = 1.888 s [state-dump] RaySyncer.OnDemandBroadcasting - 19182 total (1 active), Execution time: mean = 11.296 us, total = 216.672 ms, Queueing time: mean = 91.300 us, max = 51.440 ms, min = 7.347 us, total = 1.751 s [state-dump] ObjectManager.UpdateAvailableMemory - 19181 total (0 active), Execution time: mean = 5.757 us, total = 110.419 ms, Queueing time: mean = 100.689 us, max = 837.793 us, min = 2.098 us, total = 1.931 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 9597 total (1 active), Execution time: mean = 18.799 us, total = 180.411 ms, Queueing time: mean = 75.616 us, max = 13.722 ms, min = 9.036 us, total = 725.689 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 7665 total (1 active), Execution time: mean = 452.848 us, total = 3.471 s, Queueing time: mean = 73.810 us, max = 978.705 us, min = -0.000 s, total = 565.751 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1920 total (1 active), Execution time: mean = 2.934 us, total = 5.634 ms, Queueing time: mean = 183.437 us, max = 2.379 ms, min = 4.508 us, total = 352.200 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1920 total (1 active), Execution time: mean = 15.250 us, total = 29.280 ms, Queueing time: mean = 64.101 us, max = 2.582 ms, min = 6.718 us, total = 123.075 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1920 total (1 active), Execution time: mean = 9.183 us, total = 17.631 ms, Queueing time: mean = 179.393 us, max = 2.380 ms, min = 160.000 ns, total = 344.435 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1919 total (0 active), Execution time: mean = 105.399 us, total = 202.260 ms, Queueing time: mean = 107.036 us, max = 1.188 ms, min = 5.071 us, total = 205.403 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1919 total (0 active), Execution time: mean = 634.632 us, total = 1.218 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 641 total (1 active), Execution time: mean = 9.056 us, total = 5.805 ms, Queueing time: mean = 68.500 us, max = 363.446 us, min = 9.881 us, total = 43.909 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 384 total (0 active), Execution time: mean = 53.462 us, total = 20.529 ms, Queueing time: mean = 101.458 us, max = 307.469 us, min = 11.913 us, total = 38.960 ms [state-dump] NodeManager.GcsCheckAlive - 384 total (1 active), Execution time: mean = 314.083 us, total = 120.608 ms, Queueing time: mean = 614.986 us, max = 2.263 ms, min = 12.074 us, total = 236.155 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 384 total (0 active), Execution time: mean = 1.550 ms, total = 595.313 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 384 total (1 active), Execution time: mean = 551.527 us, total = 211.787 ms, Queueing time: mean = 378.567 us, max = 1.812 ms, min = 10.977 us, total = 145.370 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 192 total (1 active), Execution time: mean = 1.791 ms, total = 343.850 ms, Queueing time: mean = 69.253 us, max = 183.426 us, min = 16.285 us, total = 13.297 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 122 total (21 active), Execution time: mean = 7.365 us, total = 898.545 us, Queueing time: mean = 93.674 s, max = 902.083 s, min = 23.644 us, total = 11428.182 s [state-dump] ClientConnection.async_read.ProcessMessage - 101 total (0 active), Execution time: mean = 760.107 us, total = 76.771 ms, Queueing time: mean = 19.894 us, max = 164.240 us, min = 2.397 us, total = 2.009 ms [state-dump] - 36 total (0 active), Execution time: mean = 890.333 ns, total = 32.052 us, Queueing time: mean = 72.072 us, max = 186.936 us, min = 20.527 us, total = 2.595 ms [state-dump] RaySyncer.BroadcastMessage - 36 total (0 active), Execution time: mean = 202.082 us, total = 7.275 ms, Queueing time: mean = 635.167 ns, max = 924.000 ns, min = 91.000 ns, total = 22.866 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 33 total (0 active), Execution time: mean = 550.187 us, total = 18.156 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 33 total (0 active), Execution time: mean = 101.288 us, total = 3.342 ms, Queueing time: mean = 82.491 us, max = 165.892 us, min = 19.400 us, total = 2.722 ms [state-dump] WorkerPool.PopWorkerCallback - 33 total (0 active), Execution time: mean = 39.520 us, total = 1.304 ms, Queueing time: mean = 220.100 us, max = 539.776 us, min = 15.433 us, total = 7.263 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 33 total (0 active), Execution time: mean = 93.106 us, total = 3.073 ms, Queueing time: mean = 231.157 us, max = 674.029 us, min = 19.147 us, total = 7.628 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 33 total (0 active), Execution time: mean = 1.005 ms, total = 33.175 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 32 total (1 active, 1 running), Execution time: mean = 2.700 ms, total = 86.387 ms, Queueing time: mean = 63.431 us, max = 150.858 us, min = 13.745 us, total = 2.030 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.728 s, total = 1798.640 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 392.023 us, total = 1.568 ms, Queueing time: mean = 146.930 us, max = 238.879 us, min = 70.382 us, total = 587.719 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 7.528 us, total = 22.583 us, Queueing time: mean = 52.348 us, max = 87.645 us, min = 69.398 us, total = 157.043 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:18:50,442 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:18:50,554 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, node:192.168.0.2: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000, accelerator_type:A40: 10000, GPU: 20000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 173630 total (35 active) [state-dump] Queueing time: mean = 232.485 ms, max = 1921.160 s, min = -0.000 s, total = 40366.341 s [state-dump] Execution time: mean = 10.558 ms, total = 1833.180 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 41577 total (0 active), Execution time: mean = 36.553 us, total = 1.520 s, Queueing time: mean = 104.052 us, max = 3.225 ms, min = 1.438 us, total = 4.326 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 41577 total (0 active), Execution time: mean = 520.808 us, total = 21.654 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 19781 total (1 active), Execution time: mean = 3.154 us, total = 62.386 ms, Queueing time: mean = 98.538 us, max = 51.449 ms, min = 3.386 us, total = 1.949 s [state-dump] RaySyncer.OnDemandBroadcasting - 19781 total (1 active), Execution time: mean = 11.464 us, total = 226.760 ms, Queueing time: mean = 91.266 us, max = 51.440 ms, min = 7.347 us, total = 1.805 s [state-dump] ObjectManager.UpdateAvailableMemory - 19780 total (0 active), Execution time: mean = 5.776 us, total = 114.257 ms, Queueing time: mean = 101.003 us, max = 837.793 us, min = 2.098 us, total = 1.998 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 9896 total (1 active), Execution time: mean = 18.817 us, total = 186.217 ms, Queueing time: mean = 75.666 us, max = 13.722 ms, min = 9.036 us, total = 748.789 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 7904 total (1 active), Execution time: mean = 453.210 us, total = 3.582 s, Queueing time: mean = 73.860 us, max = 978.705 us, min = -0.000 s, total = 583.790 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1980 total (1 active), Execution time: mean = 9.214 us, total = 18.243 ms, Queueing time: mean = 179.482 us, max = 2.380 ms, min = 160.000 ns, total = 355.374 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1980 total (1 active), Execution time: mean = 15.299 us, total = 30.292 ms, Queueing time: mean = 64.336 us, max = 2.582 ms, min = 6.718 us, total = 127.386 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1980 total (1 active), Execution time: mean = 2.941 us, total = 5.824 ms, Queueing time: mean = 183.539 us, max = 2.379 ms, min = 4.508 us, total = 363.407 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1979 total (0 active), Execution time: mean = 105.298 us, total = 208.384 ms, Queueing time: mean = 107.431 us, max = 1.188 ms, min = 5.071 us, total = 212.606 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1979 total (0 active), Execution time: mean = 634.477 us, total = 1.256 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 661 total (1 active), Execution time: mean = 9.151 us, total = 6.049 ms, Queueing time: mean = 68.588 us, max = 363.446 us, min = 9.881 us, total = 45.337 ms [state-dump] NodeManager.GcsCheckAlive - 396 total (1 active), Execution time: mean = 314.975 us, total = 124.730 ms, Queueing time: mean = 614.474 us, max = 2.263 ms, min = 12.074 us, total = 243.332 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 396 total (0 active), Execution time: mean = 53.538 us, total = 21.201 ms, Queueing time: mean = 101.956 us, max = 307.469 us, min = 11.913 us, total = 40.375 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 396 total (0 active), Execution time: mean = 1.553 ms, total = 614.867 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 396 total (1 active), Execution time: mean = 550.694 us, total = 218.075 ms, Queueing time: mean = 379.640 us, max = 1.812 ms, min = 10.977 us, total = 150.337 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 219 total (21 active), Execution time: mean = 8.326 us, total = 1.823 ms, Queueing time: mean = 184.261 s, max = 1921.160 s, min = 23.644 us, total = 40353.262 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 198 total (1 active), Execution time: mean = 1.792 ms, total = 354.805 ms, Queueing time: mean = 68.952 us, max = 183.426 us, min = 16.285 us, total = 13.652 ms [state-dump] ClientConnection.async_read.ProcessMessage - 198 total (0 active), Execution time: mean = 394.041 us, total = 78.020 ms, Queueing time: mean = 19.652 us, max = 494.085 us, min = 2.397 us, total = 3.891 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 63 total (0 active), Execution time: mean = 63.909 ms, total = 4.026 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 63 total (0 active), Execution time: mean = 97.618 us, total = 6.150 ms, Queueing time: mean = 177.793 us, max = 674.029 us, min = 6.921 us, total = 11.201 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 53 total (0 active), Execution time: mean = 104.901 us, total = 5.560 ms, Queueing time: mean = 100.521 us, max = 252.805 us, min = 19.400 us, total = 5.328 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 53 total (0 active), Execution time: mean = 582.203 us, total = 30.857 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 53 total (0 active), Execution time: mean = 34.850 us, total = 1.847 ms, Queueing time: mean = 172.571 us, max = 539.776 us, min = 15.433 us, total = 9.146 ms [state-dump] - 44 total (0 active), Execution time: mean = 914.568 ns, total = 40.241 us, Queueing time: mean = 85.708 us, max = 237.802 us, min = 20.527 us, total = 3.771 ms [state-dump] RaySyncer.BroadcastMessage - 44 total (0 active), Execution time: mean = 213.926 us, total = 9.413 ms, Queueing time: mean = 675.227 ns, max = 1.076 us, min = 91.000 ns, total = 29.710 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 33 total (1 active, 1 running), Execution time: mean = 2.713 ms, total = 89.528 ms, Queueing time: mean = 62.212 us, max = 150.858 us, min = 13.745 us, total = 2.053 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.728 s, total = 1798.640 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 392.023 us, total = 1.568 ms, Queueing time: mean = 146.930 us, max = 238.879 us, min = 70.382 us, total = 587.719 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 7.528 us, total = 22.583 us, Queueing time: mean = 52.348 us, max = 87.645 us, min = 69.398 us, total = 157.043 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:19:50,442 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:19:50,557 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, node:192.168.0.2: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000, accelerator_type:A40: 10000, GPU: 20000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 178865 total (35 active) [state-dump] Queueing time: mean = 225.683 ms, max = 1921.160 s, min = -0.000 s, total = 40366.778 s [state-dump] Execution time: mean = 10.254 ms, total = 1834.145 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 42837 total (0 active), Execution time: mean = 36.623 us, total = 1.569 s, Queueing time: mean = 104.473 us, max = 3.225 ms, min = 1.438 us, total = 4.475 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 42837 total (0 active), Execution time: mean = 521.594 us, total = 22.344 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 20381 total (1 active), Execution time: mean = 3.157 us, total = 64.342 ms, Queueing time: mean = 98.735 us, max = 51.449 ms, min = 3.386 us, total = 2.012 s [state-dump] RaySyncer.OnDemandBroadcasting - 20381 total (1 active), Execution time: mean = 11.478 us, total = 233.928 ms, Queueing time: mean = 91.454 us, max = 51.440 ms, min = 7.347 us, total = 1.864 s [state-dump] ObjectManager.UpdateAvailableMemory - 20380 total (0 active), Execution time: mean = 5.802 us, total = 118.244 ms, Queueing time: mean = 101.430 us, max = 837.793 us, min = 2.098 us, total = 2.067 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 10196 total (1 active), Execution time: mean = 18.904 us, total = 192.750 ms, Queueing time: mean = 75.928 us, max = 13.722 ms, min = 9.036 us, total = 774.163 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 8144 total (1 active), Execution time: mean = 453.638 us, total = 3.694 s, Queueing time: mean = 74.029 us, max = 978.705 us, min = -0.000 s, total = 602.893 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2040 total (1 active), Execution time: mean = 9.256 us, total = 18.882 ms, Queueing time: mean = 179.989 us, max = 2.380 ms, min = 160.000 ns, total = 367.178 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2040 total (1 active), Execution time: mean = 15.343 us, total = 31.300 ms, Queueing time: mean = 64.573 us, max = 2.582 ms, min = 6.718 us, total = 131.730 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2040 total (1 active), Execution time: mean = 2.947 us, total = 6.013 ms, Queueing time: mean = 184.070 us, max = 2.379 ms, min = 4.508 us, total = 375.504 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2039 total (0 active), Execution time: mean = 105.146 us, total = 214.393 ms, Queueing time: mean = 108.178 us, max = 1.188 ms, min = 5.071 us, total = 220.575 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2039 total (0 active), Execution time: mean = 634.800 us, total = 1.294 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 681 total (1 active), Execution time: mean = 9.169 us, total = 6.244 ms, Queueing time: mean = 68.835 us, max = 363.446 us, min = 9.881 us, total = 46.877 ms [state-dump] NodeManager.GcsCheckAlive - 408 total (1 active), Execution time: mean = 316.617 us, total = 129.180 ms, Queueing time: mean = 615.673 us, max = 2.263 ms, min = 12.074 us, total = 251.195 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 408 total (0 active), Execution time: mean = 53.885 us, total = 21.985 ms, Queueing time: mean = 102.700 us, max = 307.469 us, min = 11.913 us, total = 41.902 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 408 total (0 active), Execution time: mean = 1.556 ms, total = 634.965 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 408 total (1 active), Execution time: mean = 551.871 us, total = 225.163 ms, Queueing time: mean = 381.211 us, max = 1.812 ms, min = 10.977 us, total = 155.534 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 219 total (21 active), Execution time: mean = 8.326 us, total = 1.823 ms, Queueing time: mean = 184.261 s, max = 1921.160 s, min = 23.644 us, total = 40353.262 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 204 total (1 active), Execution time: mean = 1.796 ms, total = 366.407 ms, Queueing time: mean = 69.706 us, max = 183.426 us, min = 16.285 us, total = 14.220 ms [state-dump] ClientConnection.async_read.ProcessMessage - 198 total (0 active), Execution time: mean = 394.041 us, total = 78.020 ms, Queueing time: mean = 19.652 us, max = 494.085 us, min = 2.397 us, total = 3.891 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 63 total (0 active), Execution time: mean = 63.909 ms, total = 4.026 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 63 total (0 active), Execution time: mean = 97.618 us, total = 6.150 ms, Queueing time: mean = 177.793 us, max = 674.029 us, min = 6.921 us, total = 11.201 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 53 total (0 active), Execution time: mean = 104.901 us, total = 5.560 ms, Queueing time: mean = 100.521 us, max = 252.805 us, min = 19.400 us, total = 5.328 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 53 total (0 active), Execution time: mean = 582.203 us, total = 30.857 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 53 total (0 active), Execution time: mean = 34.850 us, total = 1.847 ms, Queueing time: mean = 172.571 us, max = 539.776 us, min = 15.433 us, total = 9.146 ms [state-dump] - 44 total (0 active), Execution time: mean = 914.568 ns, total = 40.241 us, Queueing time: mean = 85.708 us, max = 237.802 us, min = 20.527 us, total = 3.771 ms [state-dump] RaySyncer.BroadcastMessage - 44 total (0 active), Execution time: mean = 213.926 us, total = 9.413 ms, Queueing time: mean = 675.227 ns, max = 1.076 us, min = 91.000 ns, total = 29.710 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 34 total (1 active, 1 running), Execution time: mean = 2.713 ms, total = 92.259 ms, Queueing time: mean = 62.346 us, max = 150.858 us, min = 13.745 us, total = 2.120 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.728 s, total = 1798.640 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 392.023 us, total = 1.568 ms, Queueing time: mean = 146.930 us, max = 238.879 us, min = 70.382 us, total = 587.719 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 7.528 us, total = 22.583 us, Queueing time: mean = 52.348 us, max = 87.645 us, min = 69.398 us, total = 157.043 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:20:50,443 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:20:50,560 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, node:192.168.0.2: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000, accelerator_type:A40: 10000, GPU: 20000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 184096 total (35 active) [state-dump] Queueing time: mean = 219.272 ms, max = 1921.160 s, min = -0.000 s, total = 40367.182 s [state-dump] Execution time: mean = 9.968 ms, total = 1835.041 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 44097 total (0 active), Execution time: mean = 36.568 us, total = 1.613 s, Queueing time: mean = 104.799 us, max = 3.225 ms, min = 1.438 us, total = 4.621 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 44097 total (0 active), Execution time: mean = 521.327 us, total = 22.989 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 20980 total (1 active), Execution time: mean = 3.155 us, total = 66.202 ms, Queueing time: mean = 98.627 us, max = 51.449 ms, min = 3.386 us, total = 2.069 s [state-dump] RaySyncer.OnDemandBroadcasting - 20980 total (1 active), Execution time: mean = 11.454 us, total = 240.301 ms, Queueing time: mean = 91.367 us, max = 51.440 ms, min = 7.347 us, total = 1.917 s [state-dump] ObjectManager.UpdateAvailableMemory - 20979 total (0 active), Execution time: mean = 5.799 us, total = 121.665 ms, Queueing time: mean = 101.663 us, max = 837.793 us, min = 2.098 us, total = 2.133 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 10496 total (1 active), Execution time: mean = 18.900 us, total = 198.377 ms, Queueing time: mean = 75.796 us, max = 13.722 ms, min = 9.036 us, total = 795.559 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 8383 total (1 active), Execution time: mean = 453.457 us, total = 3.801 s, Queueing time: mean = 73.942 us, max = 978.705 us, min = -0.000 s, total = 619.854 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2100 total (1 active), Execution time: mean = 9.255 us, total = 19.434 ms, Queueing time: mean = 179.376 us, max = 2.380 ms, min = 160.000 ns, total = 376.689 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2100 total (1 active), Execution time: mean = 15.304 us, total = 32.139 ms, Queueing time: mean = 64.543 us, max = 2.582 ms, min = 6.718 us, total = 135.541 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2100 total (1 active), Execution time: mean = 2.946 us, total = 6.187 ms, Queueing time: mean = 183.456 us, max = 2.379 ms, min = 4.508 us, total = 385.257 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2099 total (0 active), Execution time: mean = 104.846 us, total = 220.072 ms, Queueing time: mean = 108.529 us, max = 1.188 ms, min = 5.071 us, total = 227.803 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2099 total (0 active), Execution time: mean = 633.810 us, total = 1.330 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 701 total (1 active), Execution time: mean = 9.180 us, total = 6.435 ms, Queueing time: mean = 69.180 us, max = 363.446 us, min = 9.881 us, total = 48.495 ms [state-dump] NodeManager.GcsCheckAlive - 420 total (1 active), Execution time: mean = 316.106 us, total = 132.765 ms, Queueing time: mean = 613.061 us, max = 2.263 ms, min = 6.025 us, total = 257.486 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 420 total (0 active), Execution time: mean = 53.771 us, total = 22.584 ms, Queueing time: mean = 102.847 us, max = 307.469 us, min = 11.913 us, total = 43.196 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 420 total (0 active), Execution time: mean = 1.550 ms, total = 651.044 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 420 total (1 active), Execution time: mean = 550.344 us, total = 231.144 ms, Queueing time: mean = 379.880 us, max = 1.812 ms, min = 8.454 us, total = 159.550 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 219 total (21 active), Execution time: mean = 8.326 us, total = 1.823 ms, Queueing time: mean = 184.261 s, max = 1921.160 s, min = 23.644 us, total = 40353.262 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 210 total (1 active), Execution time: mean = 1.792 ms, total = 376.272 ms, Queueing time: mean = 69.394 us, max = 183.426 us, min = 11.269 us, total = 14.573 ms [state-dump] ClientConnection.async_read.ProcessMessage - 198 total (0 active), Execution time: mean = 394.041 us, total = 78.020 ms, Queueing time: mean = 19.652 us, max = 494.085 us, min = 2.397 us, total = 3.891 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 63 total (0 active), Execution time: mean = 63.909 ms, total = 4.026 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 63 total (0 active), Execution time: mean = 97.618 us, total = 6.150 ms, Queueing time: mean = 177.793 us, max = 674.029 us, min = 6.921 us, total = 11.201 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 53 total (0 active), Execution time: mean = 104.901 us, total = 5.560 ms, Queueing time: mean = 100.521 us, max = 252.805 us, min = 19.400 us, total = 5.328 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 53 total (0 active), Execution time: mean = 582.203 us, total = 30.857 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 53 total (0 active), Execution time: mean = 34.850 us, total = 1.847 ms, Queueing time: mean = 172.571 us, max = 539.776 us, min = 15.433 us, total = 9.146 ms [state-dump] - 44 total (0 active), Execution time: mean = 914.568 ns, total = 40.241 us, Queueing time: mean = 85.708 us, max = 237.802 us, min = 20.527 us, total = 3.771 ms [state-dump] RaySyncer.BroadcastMessage - 44 total (0 active), Execution time: mean = 213.926 us, total = 9.413 ms, Queueing time: mean = 675.227 ns, max = 1.076 us, min = 91.000 ns, total = 29.710 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 35 total (1 active, 1 running), Execution time: mean = 2.720 ms, total = 95.209 ms, Queueing time: mean = 62.397 us, max = 150.858 us, min = 13.745 us, total = 2.184 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.728 s, total = 1798.640 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 392.023 us, total = 1.568 ms, Queueing time: mean = 146.930 us, max = 238.879 us, min = 70.382 us, total = 587.719 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 7.528 us, total = 22.583 us, Queueing time: mean = 52.348 us, max = 87.645 us, min = 69.398 us, total = 157.043 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:21:50,443 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:21:50,562 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, node:192.168.0.2: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000, accelerator_type:A40: 10000, GPU: 20000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 189331 total (35 active) [state-dump] Queueing time: mean = 213.212 ms, max = 1921.160 s, min = -0.000 s, total = 40367.574 s [state-dump] Execution time: mean = 9.697 ms, total = 1835.947 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 45357 total (0 active), Execution time: mean = 36.518 us, total = 1.656 s, Queueing time: mean = 104.954 us, max = 3.225 ms, min = 1.438 us, total = 4.760 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 45357 total (0 active), Execution time: mean = 521.294 us, total = 23.644 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 21580 total (1 active), Execution time: mean = 3.153 us, total = 68.034 ms, Queueing time: mean = 98.400 us, max = 51.449 ms, min = 3.386 us, total = 2.123 s [state-dump] RaySyncer.OnDemandBroadcasting - 21580 total (1 active), Execution time: mean = 11.434 us, total = 246.749 ms, Queueing time: mean = 91.157 us, max = 51.440 ms, min = 7.347 us, total = 1.967 s [state-dump] ObjectManager.UpdateAvailableMemory - 21579 total (0 active), Execution time: mean = 5.794 us, total = 125.026 ms, Queueing time: mean = 101.754 us, max = 837.793 us, min = 2.098 us, total = 2.196 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 10796 total (1 active), Execution time: mean = 18.853 us, total = 203.542 ms, Queueing time: mean = 75.551 us, max = 13.722 ms, min = 9.036 us, total = 815.653 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 8623 total (1 active), Execution time: mean = 453.235 us, total = 3.908 s, Queueing time: mean = 73.913 us, max = 978.705 us, min = -0.000 s, total = 637.349 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2160 total (1 active), Execution time: mean = 9.256 us, total = 19.993 ms, Queueing time: mean = 179.381 us, max = 2.380 ms, min = 160.000 ns, total = 387.464 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2160 total (1 active), Execution time: mean = 15.289 us, total = 33.025 ms, Queueing time: mean = 64.367 us, max = 2.582 ms, min = 6.718 us, total = 139.033 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2160 total (1 active), Execution time: mean = 2.944 us, total = 6.359 ms, Queueing time: mean = 183.462 us, max = 2.379 ms, min = 4.508 us, total = 396.277 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2159 total (0 active), Execution time: mean = 104.280 us, total = 225.141 ms, Queueing time: mean = 108.628 us, max = 1.188 ms, min = 5.071 us, total = 234.528 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2159 total (0 active), Execution time: mean = 632.052 us, total = 1.365 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 721 total (1 active), Execution time: mean = 9.178 us, total = 6.617 ms, Queueing time: mean = 69.346 us, max = 363.446 us, min = 9.881 us, total = 49.998 ms [state-dump] NodeManager.GcsCheckAlive - 432 total (1 active), Execution time: mean = 315.803 us, total = 136.427 ms, Queueing time: mean = 613.607 us, max = 2.263 ms, min = 6.025 us, total = 265.078 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 432 total (0 active), Execution time: mean = 53.748 us, total = 23.219 ms, Queueing time: mean = 103.219 us, max = 307.469 us, min = 11.913 us, total = 44.590 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 432 total (0 active), Execution time: mean = 1.549 ms, total = 669.081 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 432 total (1 active), Execution time: mean = 550.557 us, total = 237.841 ms, Queueing time: mean = 379.982 us, max = 1.812 ms, min = 8.454 us, total = 164.152 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 219 total (21 active), Execution time: mean = 8.326 us, total = 1.823 ms, Queueing time: mean = 184.261 s, max = 1921.160 s, min = 23.644 us, total = 40353.262 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 216 total (1 active), Execution time: mean = 1.792 ms, total = 386.971 ms, Queueing time: mean = 69.446 us, max = 183.426 us, min = 11.269 us, total = 15.000 ms [state-dump] ClientConnection.async_read.ProcessMessage - 198 total (0 active), Execution time: mean = 394.041 us, total = 78.020 ms, Queueing time: mean = 19.652 us, max = 494.085 us, min = 2.397 us, total = 3.891 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 63 total (0 active), Execution time: mean = 63.909 ms, total = 4.026 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 63 total (0 active), Execution time: mean = 97.618 us, total = 6.150 ms, Queueing time: mean = 177.793 us, max = 674.029 us, min = 6.921 us, total = 11.201 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 53 total (0 active), Execution time: mean = 104.901 us, total = 5.560 ms, Queueing time: mean = 100.521 us, max = 252.805 us, min = 19.400 us, total = 5.328 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 53 total (0 active), Execution time: mean = 582.203 us, total = 30.857 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 53 total (0 active), Execution time: mean = 34.850 us, total = 1.847 ms, Queueing time: mean = 172.571 us, max = 539.776 us, min = 15.433 us, total = 9.146 ms [state-dump] - 44 total (0 active), Execution time: mean = 914.568 ns, total = 40.241 us, Queueing time: mean = 85.708 us, max = 237.802 us, min = 20.527 us, total = 3.771 ms [state-dump] RaySyncer.BroadcastMessage - 44 total (0 active), Execution time: mean = 213.926 us, total = 9.413 ms, Queueing time: mean = 675.227 ns, max = 1.076 us, min = 91.000 ns, total = 29.710 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 36 total (1 active, 1 running), Execution time: mean = 2.719 ms, total = 97.891 ms, Queueing time: mean = 62.088 us, max = 150.858 us, min = 13.745 us, total = 2.235 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.728 s, total = 1798.640 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 392.023 us, total = 1.568 ms, Queueing time: mean = 146.930 us, max = 238.879 us, min = 70.382 us, total = 587.719 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 7.528 us, total = 22.583 us, Queueing time: mean = 52.348 us, max = 87.645 us, min = 69.398 us, total = 157.043 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 22:22:50,443 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:22:50,566 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, node:192.168.0.2: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000, accelerator_type:A40: 10000, GPU: 20000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 194562 total (35 active) [state-dump] Queueing time: mean = 207.481 ms, max = 1921.160 s, min = -0.000 s, total = 40367.935 s [state-dump] Execution time: mean = 9.441 ms, total = 1836.801 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 46617 total (0 active), Execution time: mean = 36.392 us, total = 1.696 s, Queueing time: mean = 104.802 us, max = 3.225 ms, min = 1.438 us, total = 4.886 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 46617 total (0 active), Execution time: mean = 520.414 us, total = 24.260 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 22179 total (1 active), Execution time: mean = 3.148 us, total = 69.824 ms, Queueing time: mean = 98.100 us, max = 51.449 ms, min = 3.386 us, total = 2.176 s [state-dump] RaySyncer.OnDemandBroadcasting - 22179 total (1 active), Execution time: mean = 11.387 us, total = 252.554 ms, Queueing time: mean = 90.897 us, max = 51.440 ms, min = 7.347 us, total = 2.016 s [state-dump] ObjectManager.UpdateAvailableMemory - 22178 total (0 active), Execution time: mean = 5.773 us, total = 128.040 ms, Queueing time: mean = 101.551 us, max = 1.031 ms, min = 2.098 us, total = 2.252 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 11096 total (1 active), Execution time: mean = 18.858 us, total = 209.252 ms, Queueing time: mean = 75.290 us, max = 13.722 ms, min = 9.036 us, total = 835.415 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 8862 total (1 active), Execution time: mean = 452.709 us, total = 4.012 s, Queueing time: mean = 73.731 us, max = 978.705 us, min = -0.000 s, total = 653.400 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2220 total (1 active), Execution time: mean = 9.221 us, total = 20.471 ms, Queueing time: mean = 178.912 us, max = 2.380 ms, min = 160.000 ns, total = 397.185 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2220 total (1 active), Execution time: mean = 15.254 us, total = 33.863 ms, Queueing time: mean = 64.349 us, max = 2.582 ms, min = 6.718 us, total = 142.855 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2220 total (1 active), Execution time: mean = 2.937 us, total = 6.519 ms, Queueing time: mean = 182.977 us, max = 2.379 ms, min = 4.508 us, total = 406.210 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2219 total (0 active), Execution time: mean = 103.705 us, total = 230.121 ms, Queueing time: mean = 108.323 us, max = 1.188 ms, min = 4.918 us, total = 240.370 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2219 total (0 active), Execution time: mean = 629.470 us, total = 1.397 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 741 total (1 active), Execution time: mean = 9.149 us, total = 6.780 ms, Queueing time: mean = 69.233 us, max = 363.446 us, min = 9.881 us, total = 51.302 ms [state-dump] NodeManager.GcsCheckAlive - 444 total (1 active), Execution time: mean = 314.967 us, total = 139.845 ms, Queueing time: mean = 611.731 us, max = 2.263 ms, min = 6.025 us, total = 271.608 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 444 total (0 active), Execution time: mean = 53.677 us, total = 23.833 ms, Queueing time: mean = 103.382 us, max = 307.469 us, min = 11.913 us, total = 45.902 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 444 total (0 active), Execution time: mean = 1.545 ms, total = 685.762 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 444 total (1 active), Execution time: mean = 549.788 us, total = 244.106 ms, Queueing time: mean = 377.942 us, max = 1.812 ms, min = 8.454 us, total = 167.806 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 222 total (1 active), Execution time: mean = 1.787 ms, total = 396.703 ms, Queueing time: mean = 69.103 us, max = 183.426 us, min = 11.269 us, total = 15.341 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 219 total (21 active), Execution time: mean = 8.326 us, total = 1.823 ms, Queueing time: mean = 184.261 s, max = 1921.160 s, min = 23.644 us, total = 40353.262 s [state-dump] ClientConnection.async_read.ProcessMessage - 198 total (0 active), Execution time: mean = 394.041 us, total = 78.020 ms, Queueing time: mean = 19.652 us, max = 494.085 us, min = 2.397 us, total = 3.891 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 63 total (0 active), Execution time: mean = 63.909 ms, total = 4.026 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 63 total (0 active), Execution time: mean = 97.618 us, total = 6.150 ms, Queueing time: mean = 177.793 us, max = 674.029 us, min = 6.921 us, total = 11.201 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 53 total (0 active), Execution time: mean = 104.901 us, total = 5.560 ms, Queueing time: mean = 100.521 us, max = 252.805 us, min = 19.400 us, total = 5.328 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 53 total (0 active), Execution time: mean = 582.203 us, total = 30.857 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 53 total (0 active), Execution time: mean = 34.850 us, total = 1.847 ms, Queueing time: mean = 172.571 us, max = 539.776 us, min = 15.433 us, total = 9.146 ms [state-dump] - 44 total (0 active), Execution time: mean = 914.568 ns, total = 40.241 us, Queueing time: mean = 85.708 us, max = 237.802 us, min = 20.527 us, total = 3.771 ms [state-dump] RaySyncer.BroadcastMessage - 44 total (0 active), Execution time: mean = 213.926 us, total = 9.413 ms, Queueing time: mean = 675.227 ns, max = 1.076 us, min = 91.000 ns, total = 29.710 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 37 total (1 active, 1 running), Execution time: mean = 2.724 ms, total = 100.783 ms, Queueing time: mean = 61.570 us, max = 150.858 us, min = 13.745 us, total = 2.278 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.728 s, total = 1798.640 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 392.023 us, total = 1.568 ms, Queueing time: mean = 146.930 us, max = 238.879 us, min = 70.382 us, total = 587.719 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 7.528 us, total = 22.583 us, Queueing time: mean = 52.348 us, max = 87.645 us, min = 69.398 us, total = 157.043 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 22:23:50,443 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:23:50,569 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{GPU: 20000, node:192.168.0.2: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, accelerator_type:A40: 10000}}, "available": {object_store_memory: 21474836480000, memory: 846480855040000, node:__internal_head__: 10000, accelerator_type:A40: 10000, GPU: 20000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 199797 total (35 active) [state-dump] Queueing time: mean = 202.047 ms, max = 1921.160 s, min = -0.000 s, total = 40368.332 s [state-dump] Execution time: mean = 9.198 ms, total = 1837.680 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 47877 total (0 active), Execution time: mean = 36.396 us, total = 1.743 s, Queueing time: mean = 104.811 us, max = 3.225 ms, min = 1.438 us, total = 5.018 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 47877 total (0 active), Execution time: mean = 519.672 us, total = 24.880 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 22779 total (1 active), Execution time: mean = 3.156 us, total = 71.901 ms, Queueing time: mean = 98.261 us, max = 51.449 ms, min = 3.386 us, total = 2.238 s [state-dump] RaySyncer.OnDemandBroadcasting - 22779 total (1 active), Execution time: mean = 11.414 us, total = 259.998 ms, Queueing time: mean = 91.037 us, max = 51.440 ms, min = 7.347 us, total = 2.074 s [state-dump] ObjectManager.UpdateAvailableMemory - 22778 total (0 active), Execution time: mean = 5.780 us, total = 131.663 ms, Queueing time: mean = 101.247 us, max = 1.031 ms, min = 2.098 us, total = 2.306 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 11396 total (1 active), Execution time: mean = 18.915 us, total = 215.554 ms, Queueing time: mean = 75.421 us, max = 13.722 ms, min = 9.036 us, total = 859.493 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 9102 total (1 active), Execution time: mean = 452.767 us, total = 4.121 s, Queueing time: mean = 73.925 us, max = 978.705 us, min = -0.000 s, total = 672.864 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2280 total (1 active), Execution time: mean = 9.248 us, total = 21.086 ms, Queueing time: mean = 178.913 us, max = 2.380 ms, min = 160.000 ns, total = 407.922 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2280 total (1 active), Execution time: mean = 15.287 us, total = 34.855 ms, Queueing time: mean = 64.677 us, max = 2.582 ms, min = 6.718 us, total = 147.464 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2280 total (1 active), Execution time: mean = 2.939 us, total = 6.702 ms, Queueing time: mean = 182.998 us, max = 2.379 ms, min = 4.508 us, total = 417.235 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2279 total (0 active), Execution time: mean = 103.422 us, total = 235.699 ms, Queueing time: mean = 108.273 us, max = 1.188 ms, min = 4.918 us, total = 246.755 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2279 total (0 active), Execution time: mean = 627.596 us, total = 1.430 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 761 total (1 active), Execution time: mean = 9.168 us, total = 6.977 ms, Queueing time: mean = 69.348 us, max = 363.446 us, min = 9.881 us, total = 52.774 ms [state-dump] NodeManager.GcsCheckAlive - 456 total (1 active), Execution time: mean = 316.480 us, total = 144.315 ms, Queueing time: mean = 610.320 us, max = 2.263 ms, min = 6.025 us, total = 278.306 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 456 total (0 active), Execution time: mean = 53.692 us, total = 24.484 ms, Queueing time: mean = 102.736 us, max = 307.469 us, min = 11.913 us, total = 46.847 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 456 total (0 active), Execution time: mean = 1.543 ms, total = 703.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 456 total (1 active), Execution time: mean = 548.702 us, total = 250.208 ms, Queueing time: mean = 378.474 us, max = 1.812 ms, min = 8.454 us, total = 172.584 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 228 total (1 active), Execution time: mean = 1.785 ms, total = 407.057 ms, Queueing time: mean = 69.243 us, max = 183.426 us, min = 11.269 us, total = 15.787 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 219 total (21 active), Execution time: mean = 8.326 us, total = 1.823 ms, Queueing time: mean = 184.261 s, max = 1921.160 s, min = 23.644 us, total = 40353.262 s [state-dump] ClientConnection.async_read.ProcessMessage - 198 total (0 active), Execution time: mean = 394.041 us, total = 78.020 ms, Queueing time: mean = 19.652 us, max = 494.085 us, min = 2.397 us, total = 3.891 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 63 total (0 active), Execution time: mean = 63.909 ms, total = 4.026 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 63 total (0 active), Execution time: mean = 97.618 us, total = 6.150 ms, Queueing time: mean = 177.793 us, max = 674.029 us, min = 6.921 us, total = 11.201 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 53 total (0 active), Execution time: mean = 104.901 us, total = 5.560 ms, Queueing time: mean = 100.521 us, max = 252.805 us, min = 19.400 us, total = 5.328 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 53 total (0 active), Execution time: mean = 582.203 us, total = 30.857 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 53 total (0 active), Execution time: mean = 34.850 us, total = 1.847 ms, Queueing time: mean = 172.571 us, max = 539.776 us, min = 15.433 us, total = 9.146 ms [state-dump] - 44 total (0 active), Execution time: mean = 914.568 ns, total = 40.241 us, Queueing time: mean = 85.708 us, max = 237.802 us, min = 20.527 us, total = 3.771 ms [state-dump] RaySyncer.BroadcastMessage - 44 total (0 active), Execution time: mean = 213.926 us, total = 9.413 ms, Queueing time: mean = 675.227 ns, max = 1.076 us, min = 91.000 ns, total = 29.710 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 38 total (1 active, 1 running), Execution time: mean = 2.733 ms, total = 103.848 ms, Queueing time: mean = 62.289 us, max = 150.858 us, min = 13.745 us, total = 2.367 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.728 s, total = 1798.640 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 392.023 us, total = 1.568 ms, Queueing time: mean = 146.930 us, max = 238.879 us, min = 70.382 us, total = 587.719 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 7.528 us, total = 22.583 us, Queueing time: mean = 52.348 us, max = 87.645 us, min = 69.398 us, total = 157.043 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:24:50,444 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:24:50,572 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{node:192.168.0.2: 10000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, memory: 846480855040000, GPU: 20000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 205120 total (35 active) [state-dump] Queueing time: mean = 216.966 ms, max = 1921.160 s, min = -0.000 s, total = 44504.135 s [state-dump] Execution time: mean = 8.964 ms, total = 1838.594 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 49137 total (0 active), Execution time: mean = 36.428 us, total = 1.790 s, Queueing time: mean = 104.837 us, max = 3.225 ms, min = 1.438 us, total = 5.151 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 49137 total (0 active), Execution time: mean = 519.206 us, total = 25.512 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 23378 total (1 active), Execution time: mean = 3.161 us, total = 73.900 ms, Queueing time: mean = 98.437 us, max = 51.449 ms, min = 3.386 us, total = 2.301 s [state-dump] RaySyncer.OnDemandBroadcasting - 23378 total (1 active), Execution time: mean = 11.574 us, total = 270.585 ms, Queueing time: mean = 91.058 us, max = 51.440 ms, min = 7.347 us, total = 2.129 s [state-dump] ObjectManager.UpdateAvailableMemory - 23377 total (0 active), Execution time: mean = 5.783 us, total = 135.200 ms, Queueing time: mean = 100.799 us, max = 1.031 ms, min = 2.098 us, total = 2.356 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 11696 total (1 active), Execution time: mean = 18.940 us, total = 221.523 ms, Queueing time: mean = 75.502 us, max = 13.722 ms, min = 5.381 us, total = 883.075 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 9341 total (1 active), Execution time: mean = 452.653 us, total = 4.228 s, Queueing time: mean = 74.044 us, max = 978.705 us, min = -0.000 s, total = 691.641 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2340 total (1 active), Execution time: mean = 9.285 us, total = 21.727 ms, Queueing time: mean = 178.759 us, max = 2.380 ms, min = 160.000 ns, total = 418.296 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2340 total (1 active), Execution time: mean = 15.324 us, total = 35.858 ms, Queueing time: mean = 64.824 us, max = 2.582 ms, min = 6.718 us, total = 151.687 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2340 total (1 active), Execution time: mean = 2.941 us, total = 6.883 ms, Queueing time: mean = 182.867 us, max = 2.379 ms, min = 4.508 us, total = 427.909 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2339 total (0 active), Execution time: mean = 103.455 us, total = 241.982 ms, Queueing time: mean = 108.307 us, max = 1.188 ms, min = 4.918 us, total = 253.331 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2339 total (0 active), Execution time: mean = 626.182 us, total = 1.465 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 781 total (1 active), Execution time: mean = 9.182 us, total = 7.171 ms, Queueing time: mean = 69.283 us, max = 363.446 us, min = 7.807 us, total = 54.110 ms [state-dump] NodeManager.GcsCheckAlive - 468 total (1 active), Execution time: mean = 317.328 us, total = 148.509 ms, Queueing time: mean = 609.317 us, max = 2.263 ms, min = 6.025 us, total = 285.160 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 468 total (0 active), Execution time: mean = 53.749 us, total = 25.155 ms, Queueing time: mean = 102.448 us, max = 307.469 us, min = 11.913 us, total = 47.946 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 468 total (0 active), Execution time: mean = 1.543 ms, total = 722.025 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 468 total (1 active), Execution time: mean = 548.534 us, total = 256.714 ms, Queueing time: mean = 378.793 us, max = 1.812 ms, min = 8.454 us, total = 177.275 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 234 total (1 active), Execution time: mean = 1.785 ms, total = 417.664 ms, Queueing time: mean = 69.606 us, max = 183.426 us, min = 11.269 us, total = 16.288 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 230 total (21 active), Execution time: mean = 8.359 us, total = 1.923 ms, Queueing time: mean = 193.429 s, max = 1921.160 s, min = 23.644 us, total = 44488.667 s [state-dump] ClientConnection.async_read.ProcessMessage - 209 total (0 active), Execution time: mean = 373.917 us, total = 78.149 ms, Queueing time: mean = 20.530 us, max = 494.085 us, min = 2.397 us, total = 4.291 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 73 total (0 active), Execution time: mean = 55.270 ms, total = 4.035 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 73 total (0 active), Execution time: mean = 96.595 us, total = 7.051 ms, Queueing time: mean = 187.306 us, max = 674.029 us, min = 6.921 us, total = 13.673 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 63 total (0 active), Execution time: mean = 106.443 us, total = 6.706 ms, Queueing time: mean = 101.429 us, max = 252.805 us, min = 19.400 us, total = 6.390 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 63 total (0 active), Execution time: mean = 589.417 us, total = 37.133 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 63 total (0 active), Execution time: mean = 36.303 us, total = 2.287 ms, Queueing time: mean = 182.020 us, max = 539.776 us, min = 15.433 us, total = 11.467 ms [state-dump] - 54 total (0 active), Execution time: mean = 933.944 ns, total = 50.433 us, Queueing time: mean = 90.350 us, max = 237.802 us, min = 20.527 us, total = 4.879 ms [state-dump] RaySyncer.BroadcastMessage - 54 total (0 active), Execution time: mean = 218.069 us, total = 11.776 ms, Queueing time: mean = 704.815 ns, max = 1.206 us, min = 91.000 ns, total = 38.060 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 39 total (1 active, 1 running), Execution time: mean = 2.736 ms, total = 106.722 ms, Queueing time: mean = 62.673 us, max = 150.858 us, min = 13.745 us, total = 2.444 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.728 s, total = 1798.640 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 392.023 us, total = 1.568 ms, Queueing time: mean = 146.930 us, max = 238.879 us, min = 70.382 us, total = 587.719 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 7.528 us, total = 22.583 us, Queueing time: mean = 52.348 us, max = 87.645 us, min = 69.398 us, total = 157.043 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:25:50,444 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:25:50,573 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{node:192.168.0.2: 10000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, memory: 846480855040000, GPU: 20000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 210354 total (35 active) [state-dump] Queueing time: mean = 211.570 ms, max = 1921.160 s, min = -0.000 s, total = 44504.565 s [state-dump] Execution time: mean = 11.597 ms, total = 2439.533 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 50397 total (0 active), Execution time: mean = 36.458 us, total = 1.837 s, Queueing time: mean = 105.052 us, max = 3.225 ms, min = 1.438 us, total = 5.294 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 50397 total (0 active), Execution time: mean = 519.533 us, total = 26.183 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 23977 total (1 active), Execution time: mean = 3.166 us, total = 75.904 ms, Queueing time: mean = 98.739 us, max = 51.449 ms, min = 3.386 us, total = 2.367 s [state-dump] RaySyncer.OnDemandBroadcasting - 23977 total (1 active), Execution time: mean = 11.597 us, total = 278.071 ms, Queueing time: mean = 91.343 us, max = 51.440 ms, min = 7.347 us, total = 2.190 s [state-dump] ObjectManager.UpdateAvailableMemory - 23976 total (0 active), Execution time: mean = 5.845 us, total = 140.150 ms, Queueing time: mean = 100.990 us, max = 1.031 ms, min = 2.098 us, total = 2.421 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 11996 total (1 active), Execution time: mean = 19.041 us, total = 228.421 ms, Queueing time: mean = 75.724 us, max = 13.722 ms, min = 5.381 us, total = 908.381 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 9581 total (1 active), Execution time: mean = 452.650 us, total = 4.337 s, Queueing time: mean = 74.253 us, max = 978.705 us, min = -0.000 s, total = 711.419 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2400 total (1 active), Execution time: mean = 9.327 us, total = 22.384 ms, Queueing time: mean = 178.962 us, max = 2.380 ms, min = 160.000 ns, total = 429.508 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2400 total (1 active), Execution time: mean = 15.362 us, total = 36.869 ms, Queueing time: mean = 65.115 us, max = 2.582 ms, min = 6.718 us, total = 156.276 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2400 total (1 active), Execution time: mean = 2.941 us, total = 7.058 ms, Queueing time: mean = 183.098 us, max = 2.379 ms, min = 4.508 us, total = 439.434 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2399 total (0 active), Execution time: mean = 103.264 us, total = 247.729 ms, Queueing time: mean = 108.452 us, max = 1.188 ms, min = 4.918 us, total = 260.176 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2399 total (0 active), Execution time: mean = 625.572 us, total = 1.501 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 801 total (1 active), Execution time: mean = 9.257 us, total = 7.415 ms, Queueing time: mean = 69.855 us, max = 363.446 us, min = 7.807 us, total = 55.954 ms [state-dump] NodeManager.GcsCheckAlive - 480 total (1 active), Execution time: mean = 318.302 us, total = 152.785 ms, Queueing time: mean = 609.313 us, max = 2.263 ms, min = 6.025 us, total = 292.470 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 480 total (0 active), Execution time: mean = 53.755 us, total = 25.803 ms, Queueing time: mean = 102.451 us, max = 307.469 us, min = 11.913 us, total = 49.177 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 480 total (0 active), Execution time: mean = 1.545 ms, total = 741.394 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 480 total (1 active), Execution time: mean = 549.811 us, total = 263.909 ms, Queueing time: mean = 378.300 us, max = 1.812 ms, min = 8.454 us, total = 181.584 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 240 total (1 active), Execution time: mean = 1.784 ms, total = 428.255 ms, Queueing time: mean = 70.075 us, max = 183.426 us, min = 11.269 us, total = 16.818 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 230 total (21 active), Execution time: mean = 8.359 us, total = 1.923 ms, Queueing time: mean = 193.429 s, max = 1921.160 s, min = 23.644 us, total = 44488.667 s [state-dump] ClientConnection.async_read.ProcessMessage - 209 total (0 active), Execution time: mean = 373.917 us, total = 78.149 ms, Queueing time: mean = 20.530 us, max = 494.085 us, min = 2.397 us, total = 4.291 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 73 total (0 active), Execution time: mean = 55.270 ms, total = 4.035 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 73 total (0 active), Execution time: mean = 96.595 us, total = 7.051 ms, Queueing time: mean = 187.306 us, max = 674.029 us, min = 6.921 us, total = 13.673 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 63 total (0 active), Execution time: mean = 106.443 us, total = 6.706 ms, Queueing time: mean = 101.429 us, max = 252.805 us, min = 19.400 us, total = 6.390 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 63 total (0 active), Execution time: mean = 589.417 us, total = 37.133 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 63 total (0 active), Execution time: mean = 36.303 us, total = 2.287 ms, Queueing time: mean = 182.020 us, max = 539.776 us, min = 15.433 us, total = 11.467 ms [state-dump] - 54 total (0 active), Execution time: mean = 933.944 ns, total = 50.433 us, Queueing time: mean = 90.350 us, max = 237.802 us, min = 20.527 us, total = 4.879 ms [state-dump] RaySyncer.BroadcastMessage - 54 total (0 active), Execution time: mean = 218.069 us, total = 11.776 ms, Queueing time: mean = 704.815 ns, max = 1.206 us, min = 91.000 ns, total = 38.060 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 40 total (1 active, 1 running), Execution time: mean = 2.751 ms, total = 110.046 ms, Queueing time: mean = 62.682 us, max = 150.858 us, min = 13.745 us, total = 2.507 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.774 s, total = 2398.641 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 410.500 us, total = 2.053 ms, Queueing time: mean = 128.097 us, max = 238.879 us, min = 52.768 us, total = 640.487 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 7.528 us, total = 22.583 us, Queueing time: mean = 52.348 us, max = 87.645 us, min = 69.398 us, total = 157.043 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:26:50,444 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:26:50,576 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{node:192.168.0.2: 10000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, memory: 846480855040000, GPU: 20000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 215587 total (35 active) [state-dump] Queueing time: mean = 206.436 ms, max = 1921.160 s, min = -0.000 s, total = 44504.999 s [state-dump] Execution time: mean = 11.320 ms, total = 2440.516 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 51657 total (0 active), Execution time: mean = 36.531 us, total = 1.887 s, Queueing time: mean = 105.388 us, max = 3.225 ms, min = 1.438 us, total = 5.444 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 51657 total (0 active), Execution time: mean = 520.618 us, total = 26.894 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 24577 total (1 active), Execution time: mean = 3.170 us, total = 77.913 ms, Queueing time: mean = 98.958 us, max = 51.449 ms, min = 3.386 us, total = 2.432 s [state-dump] RaySyncer.OnDemandBroadcasting - 24577 total (1 active), Execution time: mean = 11.620 us, total = 285.596 ms, Queueing time: mean = 91.546 us, max = 51.440 ms, min = 7.347 us, total = 2.250 s [state-dump] ObjectManager.UpdateAvailableMemory - 24576 total (0 active), Execution time: mean = 5.867 us, total = 144.182 ms, Queueing time: mean = 101.233 us, max = 1.031 ms, min = 2.098 us, total = 2.488 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 12295 total (1 active), Execution time: mean = 19.105 us, total = 234.894 ms, Queueing time: mean = 75.880 us, max = 13.722 ms, min = 5.381 us, total = 932.944 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 9820 total (1 active), Execution time: mean = 453.144 us, total = 4.450 s, Queueing time: mean = 74.386 us, max = 978.705 us, min = -0.000 s, total = 730.471 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2460 total (1 active), Execution time: mean = 9.392 us, total = 23.104 ms, Queueing time: mean = 179.018 us, max = 2.380 ms, min = 160.000 ns, total = 440.385 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2460 total (1 active), Execution time: mean = 15.400 us, total = 37.883 ms, Queueing time: mean = 65.378 us, max = 2.582 ms, min = 6.718 us, total = 160.830 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2460 total (1 active), Execution time: mean = 2.945 us, total = 7.244 ms, Queueing time: mean = 183.198 us, max = 2.379 ms, min = 4.508 us, total = 450.666 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2459 total (0 active), Execution time: mean = 103.177 us, total = 253.713 ms, Queueing time: mean = 108.738 us, max = 1.188 ms, min = 4.918 us, total = 267.387 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2459 total (0 active), Execution time: mean = 625.717 us, total = 1.539 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 821 total (1 active), Execution time: mean = 9.290 us, total = 7.627 ms, Queueing time: mean = 70.132 us, max = 363.446 us, min = 7.807 us, total = 57.578 ms [state-dump] NodeManager.GcsCheckAlive - 492 total (1 active), Execution time: mean = 319.029 us, total = 156.962 ms, Queueing time: mean = 609.190 us, max = 2.263 ms, min = 6.025 us, total = 299.721 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 492 total (0 active), Execution time: mean = 53.897 us, total = 26.517 ms, Queueing time: mean = 102.578 us, max = 307.469 us, min = 11.913 us, total = 50.468 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 492 total (0 active), Execution time: mean = 1.547 ms, total = 761.241 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 492 total (1 active), Execution time: mean = 549.842 us, total = 270.522 ms, Queueing time: mean = 379.021 us, max = 1.812 ms, min = 8.454 us, total = 186.478 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 246 total (1 active), Execution time: mean = 1.787 ms, total = 439.704 ms, Queueing time: mean = 70.622 us, max = 183.426 us, min = 11.269 us, total = 17.373 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 230 total (21 active), Execution time: mean = 8.359 us, total = 1.923 ms, Queueing time: mean = 193.429 s, max = 1921.160 s, min = 23.644 us, total = 44488.667 s [state-dump] ClientConnection.async_read.ProcessMessage - 209 total (0 active), Execution time: mean = 373.917 us, total = 78.149 ms, Queueing time: mean = 20.530 us, max = 494.085 us, min = 2.397 us, total = 4.291 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 73 total (0 active), Execution time: mean = 55.270 ms, total = 4.035 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 73 total (0 active), Execution time: mean = 96.595 us, total = 7.051 ms, Queueing time: mean = 187.306 us, max = 674.029 us, min = 6.921 us, total = 13.673 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 63 total (0 active), Execution time: mean = 106.443 us, total = 6.706 ms, Queueing time: mean = 101.429 us, max = 252.805 us, min = 19.400 us, total = 6.390 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 63 total (0 active), Execution time: mean = 589.417 us, total = 37.133 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 63 total (0 active), Execution time: mean = 36.303 us, total = 2.287 ms, Queueing time: mean = 182.020 us, max = 539.776 us, min = 15.433 us, total = 11.467 ms [state-dump] - 54 total (0 active), Execution time: mean = 933.944 ns, total = 50.433 us, Queueing time: mean = 90.350 us, max = 237.802 us, min = 20.527 us, total = 4.879 ms [state-dump] RaySyncer.BroadcastMessage - 54 total (0 active), Execution time: mean = 218.069 us, total = 11.776 ms, Queueing time: mean = 704.815 ns, max = 1.206 us, min = 91.000 ns, total = 38.060 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 41 total (1 active, 1 running), Execution time: mean = 2.716 ms, total = 111.344 ms, Queueing time: mean = 62.284 us, max = 150.858 us, min = 13.745 us, total = 2.554 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.774 s, total = 2398.641 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 410.500 us, total = 2.053 ms, Queueing time: mean = 128.097 us, max = 238.879 us, min = 52.768 us, total = 640.487 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 7.528 us, total = 22.583 us, Queueing time: mean = 52.348 us, max = 87.645 us, min = 69.398 us, total = 157.043 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 22:27:50,445 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:27:50,579 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{node:192.168.0.2: 10000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, memory: 846480855040000, GPU: 20000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 220819 total (35 active) [state-dump] Queueing time: mean = 201.547 ms, max = 1921.160 s, min = -0.001 s, total = 44505.439 s [state-dump] Execution time: mean = 11.057 ms, total = 2441.507 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 52917 total (0 active), Execution time: mean = 36.613 us, total = 1.937 s, Queueing time: mean = 105.683 us, max = 3.225 ms, min = 1.438 us, total = 5.592 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 52917 total (0 active), Execution time: mean = 521.660 us, total = 27.605 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 25176 total (1 active), Execution time: mean = 3.173 us, total = 79.885 ms, Queueing time: mean = 99.231 us, max = 51.449 ms, min = 3.386 us, total = 2.498 s [state-dump] RaySyncer.OnDemandBroadcasting - 25176 total (1 active), Execution time: mean = 11.642 us, total = 293.091 ms, Queueing time: mean = 91.803 us, max = 51.440 ms, min = 7.347 us, total = 2.311 s [state-dump] ObjectManager.UpdateAvailableMemory - 25175 total (0 active), Execution time: mean = 5.889 us, total = 148.266 ms, Queueing time: mean = 101.649 us, max = 1.031 ms, min = 2.098 us, total = 2.559 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 12595 total (1 active), Execution time: mean = 19.156 us, total = 241.272 ms, Queueing time: mean = 75.965 us, max = 13.722 ms, min = 5.381 us, total = 956.779 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 10060 total (1 active), Execution time: mean = 453.801 us, total = 4.565 s, Queueing time: mean = 74.547 us, max = 1.472 ms, min = -0.001 s, total = 749.943 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2520 total (1 active), Execution time: mean = 9.403 us, total = 23.695 ms, Queueing time: mean = 179.214 us, max = 2.380 ms, min = 160.000 ns, total = 451.620 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2520 total (1 active), Execution time: mean = 15.454 us, total = 38.943 ms, Queueing time: mean = 65.537 us, max = 2.582 ms, min = 6.718 us, total = 165.153 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2520 total (1 active), Execution time: mean = 2.947 us, total = 7.425 ms, Queueing time: mean = 183.397 us, max = 2.379 ms, min = 4.508 us, total = 462.162 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2519 total (0 active), Execution time: mean = 103.208 us, total = 259.981 ms, Queueing time: mean = 108.913 us, max = 1.188 ms, min = 4.918 us, total = 274.352 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2519 total (0 active), Execution time: mean = 626.118 us, total = 1.577 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 841 total (1 active), Execution time: mean = 9.304 us, total = 7.825 ms, Queueing time: mean = 70.274 us, max = 363.446 us, min = 7.807 us, total = 59.100 ms [state-dump] NodeManager.GcsCheckAlive - 504 total (1 active), Execution time: mean = 319.863 us, total = 161.211 ms, Queueing time: mean = 609.602 us, max = 2.263 ms, min = 6.025 us, total = 307.239 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 504 total (0 active), Execution time: mean = 54.082 us, total = 27.257 ms, Queueing time: mean = 102.737 us, max = 307.469 us, min = 11.913 us, total = 51.779 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 504 total (0 active), Execution time: mean = 1.551 ms, total = 781.933 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 504 total (1 active), Execution time: mean = 550.660 us, total = 277.533 ms, Queueing time: mean = 379.500 us, max = 1.812 ms, min = 8.454 us, total = 191.268 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 252 total (1 active), Execution time: mean = 1.789 ms, total = 450.953 ms, Queueing time: mean = 70.358 us, max = 183.426 us, min = 11.269 us, total = 17.730 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 230 total (21 active), Execution time: mean = 8.359 us, total = 1.923 ms, Queueing time: mean = 193.429 s, max = 1921.160 s, min = 23.644 us, total = 44488.667 s [state-dump] ClientConnection.async_read.ProcessMessage - 209 total (0 active), Execution time: mean = 373.917 us, total = 78.149 ms, Queueing time: mean = 20.530 us, max = 494.085 us, min = 2.397 us, total = 4.291 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 73 total (0 active), Execution time: mean = 55.270 ms, total = 4.035 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 73 total (0 active), Execution time: mean = 96.595 us, total = 7.051 ms, Queueing time: mean = 187.306 us, max = 674.029 us, min = 6.921 us, total = 13.673 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 63 total (0 active), Execution time: mean = 106.443 us, total = 6.706 ms, Queueing time: mean = 101.429 us, max = 252.805 us, min = 19.400 us, total = 6.390 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 63 total (0 active), Execution time: mean = 589.417 us, total = 37.133 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 63 total (0 active), Execution time: mean = 36.303 us, total = 2.287 ms, Queueing time: mean = 182.020 us, max = 539.776 us, min = 15.433 us, total = 11.467 ms [state-dump] - 54 total (0 active), Execution time: mean = 933.944 ns, total = 50.433 us, Queueing time: mean = 90.350 us, max = 237.802 us, min = 20.527 us, total = 4.879 ms [state-dump] RaySyncer.BroadcastMessage - 54 total (0 active), Execution time: mean = 218.069 us, total = 11.776 ms, Queueing time: mean = 704.815 ns, max = 1.206 us, min = 91.000 ns, total = 38.060 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 42 total (1 active, 1 running), Execution time: mean = 2.721 ms, total = 114.263 ms, Queueing time: mean = 62.250 us, max = 150.858 us, min = 13.745 us, total = 2.614 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.774 s, total = 2398.641 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 410.500 us, total = 2.053 ms, Queueing time: mean = 128.097 us, max = 238.879 us, min = 52.768 us, total = 640.487 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 7.528 us, total = 22.583 us, Queueing time: mean = 52.348 us, max = 87.645 us, min = 69.398 us, total = 157.043 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 22:28:50,445 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:28:50,582 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{node:192.168.0.2: 10000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, memory: 846480855040000, GPU: 20000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 226050 total (35 active) [state-dump] Queueing time: mean = 196.885 ms, max = 1921.160 s, min = -0.001 s, total = 44505.866 s [state-dump] Execution time: mean = 10.805 ms, total = 2442.462 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 54177 total (0 active), Execution time: mean = 36.647 us, total = 1.985 s, Queueing time: mean = 105.940 us, max = 3.225 ms, min = 1.438 us, total = 5.740 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 54177 total (0 active), Execution time: mean = 522.201 us, total = 28.291 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 25775 total (1 active), Execution time: mean = 3.174 us, total = 81.805 ms, Queueing time: mean = 99.239 us, max = 51.449 ms, min = 3.386 us, total = 2.558 s [state-dump] RaySyncer.OnDemandBroadcasting - 25775 total (1 active), Execution time: mean = 11.637 us, total = 299.936 ms, Queueing time: mean = 91.817 us, max = 51.440 ms, min = 7.347 us, total = 2.367 s [state-dump] ObjectManager.UpdateAvailableMemory - 25774 total (0 active), Execution time: mean = 5.904 us, total = 152.159 ms, Queueing time: mean = 101.993 us, max = 1.031 ms, min = 2.098 us, total = 2.629 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 12895 total (1 active), Execution time: mean = 19.179 us, total = 247.317 ms, Queueing time: mean = 76.195 us, max = 13.722 ms, min = 5.381 us, total = 982.530 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 10299 total (1 active), Execution time: mean = 454.123 us, total = 4.677 s, Queueing time: mean = 74.629 us, max = 1.472 ms, min = -0.001 s, total = 768.603 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2580 total (1 active), Execution time: mean = 9.418 us, total = 24.299 ms, Queueing time: mean = 179.570 us, max = 2.380 ms, min = 160.000 ns, total = 463.290 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2580 total (1 active), Execution time: mean = 15.462 us, total = 39.892 ms, Queueing time: mean = 65.625 us, max = 2.582 ms, min = 6.718 us, total = 169.314 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2580 total (1 active), Execution time: mean = 2.950 us, total = 7.611 ms, Queueing time: mean = 183.759 us, max = 2.379 ms, min = 4.508 us, total = 474.099 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2579 total (0 active), Execution time: mean = 103.092 us, total = 265.876 ms, Queueing time: mean = 109.114 us, max = 1.188 ms, min = 4.918 us, total = 281.406 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2579 total (0 active), Execution time: mean = 625.753 us, total = 1.614 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 861 total (1 active), Execution time: mean = 9.314 us, total = 8.019 ms, Queueing time: mean = 70.662 us, max = 363.446 us, min = 7.807 us, total = 60.840 ms [state-dump] NodeManager.GcsCheckAlive - 516 total (1 active), Execution time: mean = 320.407 us, total = 165.330 ms, Queueing time: mean = 610.832 us, max = 2.263 ms, min = 6.025 us, total = 315.189 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 516 total (0 active), Execution time: mean = 54.138 us, total = 27.935 ms, Queueing time: mean = 102.843 us, max = 307.469 us, min = 11.913 us, total = 53.067 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 516 total (0 active), Execution time: mean = 1.554 ms, total = 801.653 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 516 total (1 active), Execution time: mean = 550.902 us, total = 284.265 ms, Queueing time: mean = 381.032 us, max = 1.812 ms, min = 8.454 us, total = 196.613 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 258 total (1 active), Execution time: mean = 1.793 ms, total = 462.716 ms, Queueing time: mean = 70.455 us, max = 183.426 us, min = 11.269 us, total = 18.177 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 230 total (21 active), Execution time: mean = 8.359 us, total = 1.923 ms, Queueing time: mean = 193.429 s, max = 1921.160 s, min = 23.644 us, total = 44488.667 s [state-dump] ClientConnection.async_read.ProcessMessage - 209 total (0 active), Execution time: mean = 373.917 us, total = 78.149 ms, Queueing time: mean = 20.530 us, max = 494.085 us, min = 2.397 us, total = 4.291 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 73 total (0 active), Execution time: mean = 55.270 ms, total = 4.035 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 73 total (0 active), Execution time: mean = 96.595 us, total = 7.051 ms, Queueing time: mean = 187.306 us, max = 674.029 us, min = 6.921 us, total = 13.673 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 63 total (0 active), Execution time: mean = 106.443 us, total = 6.706 ms, Queueing time: mean = 101.429 us, max = 252.805 us, min = 19.400 us, total = 6.390 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 63 total (0 active), Execution time: mean = 589.417 us, total = 37.133 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 63 total (0 active), Execution time: mean = 36.303 us, total = 2.287 ms, Queueing time: mean = 182.020 us, max = 539.776 us, min = 15.433 us, total = 11.467 ms [state-dump] - 54 total (0 active), Execution time: mean = 933.944 ns, total = 50.433 us, Queueing time: mean = 90.350 us, max = 237.802 us, min = 20.527 us, total = 4.879 ms [state-dump] RaySyncer.BroadcastMessage - 54 total (0 active), Execution time: mean = 218.069 us, total = 11.776 ms, Queueing time: mean = 704.815 ns, max = 1.206 us, min = 91.000 ns, total = 38.060 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 43 total (1 active, 1 running), Execution time: mean = 2.721 ms, total = 117.020 ms, Queueing time: mean = 62.794 us, max = 150.858 us, min = 13.745 us, total = 2.700 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.774 s, total = 2398.641 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 410.500 us, total = 2.053 ms, Queueing time: mean = 128.097 us, max = 238.879 us, min = 52.768 us, total = 640.487 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 7.528 us, total = 22.583 us, Queueing time: mean = 52.348 us, max = 87.645 us, min = 69.398 us, total = 157.043 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:29:50,445 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:29:50,587 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{node:192.168.0.2: 10000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, memory: 846480855040000, GPU: 20000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 231285 total (35 active) [state-dump] Queueing time: mean = 192.431 ms, max = 1921.160 s, min = -0.001 s, total = 44506.310 s [state-dump] Execution time: mean = 10.565 ms, total = 2443.445 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 55437 total (0 active), Execution time: mean = 36.701 us, total = 2.035 s, Queueing time: mean = 106.290 us, max = 3.225 ms, min = 1.438 us, total = 5.892 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 55437 total (0 active), Execution time: mean = 523.092 us, total = 28.999 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 26375 total (1 active), Execution time: mean = 3.176 us, total = 83.774 ms, Queueing time: mean = 99.464 us, max = 51.449 ms, min = 3.386 us, total = 2.623 s [state-dump] RaySyncer.OnDemandBroadcasting - 26375 total (1 active), Execution time: mean = 11.652 us, total = 307.321 ms, Queueing time: mean = 92.030 us, max = 51.440 ms, min = 7.347 us, total = 2.427 s [state-dump] ObjectManager.UpdateAvailableMemory - 26374 total (0 active), Execution time: mean = 5.923 us, total = 156.209 ms, Queueing time: mean = 102.387 us, max = 1.031 ms, min = 2.098 us, total = 2.700 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 13195 total (1 active), Execution time: mean = 19.226 us, total = 253.688 ms, Queueing time: mean = 76.272 us, max = 13.722 ms, min = 5.381 us, total = 1.006 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 10539 total (1 active), Execution time: mean = 454.531 us, total = 4.790 s, Queueing time: mean = 74.692 us, max = 1.472 ms, min = -0.001 s, total = 787.174 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2640 total (1 active), Execution time: mean = 9.432 us, total = 24.900 ms, Queueing time: mean = 179.889 us, max = 2.380 ms, min = 160.000 ns, total = 474.908 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2640 total (1 active), Execution time: mean = 15.457 us, total = 40.807 ms, Queueing time: mean = 65.630 us, max = 2.582 ms, min = 6.718 us, total = 173.264 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2640 total (1 active), Execution time: mean = 2.954 us, total = 7.798 ms, Queueing time: mean = 184.084 us, max = 2.379 ms, min = 4.508 us, total = 485.982 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2639 total (0 active), Execution time: mean = 103.103 us, total = 272.090 ms, Queueing time: mean = 109.393 us, max = 1.188 ms, min = 4.918 us, total = 288.688 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2639 total (0 active), Execution time: mean = 626.038 us, total = 1.652 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 881 total (1 active), Execution time: mean = 9.307 us, total = 8.199 ms, Queueing time: mean = 70.605 us, max = 363.446 us, min = 7.807 us, total = 62.203 ms [state-dump] NodeManager.GcsCheckAlive - 528 total (1 active), Execution time: mean = 321.771 us, total = 169.895 ms, Queueing time: mean = 611.229 us, max = 2.263 ms, min = 6.025 us, total = 322.729 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 528 total (0 active), Execution time: mean = 54.288 us, total = 28.664 ms, Queueing time: mean = 103.308 us, max = 307.469 us, min = 11.913 us, total = 54.547 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 528 total (0 active), Execution time: mean = 1.557 ms, total = 821.907 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 528 total (1 active), Execution time: mean = 551.862 us, total = 291.383 ms, Queueing time: mean = 381.818 us, max = 1.812 ms, min = 8.454 us, total = 201.600 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 264 total (1 active), Execution time: mean = 1.796 ms, total = 474.192 ms, Queueing time: mean = 71.066 us, max = 183.426 us, min = 11.269 us, total = 18.761 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 230 total (21 active), Execution time: mean = 8.359 us, total = 1.923 ms, Queueing time: mean = 193.429 s, max = 1921.160 s, min = 23.644 us, total = 44488.667 s [state-dump] ClientConnection.async_read.ProcessMessage - 209 total (0 active), Execution time: mean = 373.917 us, total = 78.149 ms, Queueing time: mean = 20.530 us, max = 494.085 us, min = 2.397 us, total = 4.291 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 73 total (0 active), Execution time: mean = 55.270 ms, total = 4.035 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 73 total (0 active), Execution time: mean = 96.595 us, total = 7.051 ms, Queueing time: mean = 187.306 us, max = 674.029 us, min = 6.921 us, total = 13.673 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 63 total (0 active), Execution time: mean = 106.443 us, total = 6.706 ms, Queueing time: mean = 101.429 us, max = 252.805 us, min = 19.400 us, total = 6.390 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 63 total (0 active), Execution time: mean = 589.417 us, total = 37.133 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 63 total (0 active), Execution time: mean = 36.303 us, total = 2.287 ms, Queueing time: mean = 182.020 us, max = 539.776 us, min = 15.433 us, total = 11.467 ms [state-dump] - 54 total (0 active), Execution time: mean = 933.944 ns, total = 50.433 us, Queueing time: mean = 90.350 us, max = 237.802 us, min = 20.527 us, total = 4.879 ms [state-dump] RaySyncer.BroadcastMessage - 54 total (0 active), Execution time: mean = 218.069 us, total = 11.776 ms, Queueing time: mean = 704.815 ns, max = 1.206 us, min = 91.000 ns, total = 38.060 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 44 total (1 active, 1 running), Execution time: mean = 2.727 ms, total = 119.983 ms, Queueing time: mean = 63.344 us, max = 150.858 us, min = 13.745 us, total = 2.787 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.774 s, total = 2398.641 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 410.500 us, total = 2.053 ms, Queueing time: mean = 128.097 us, max = 238.879 us, min = 52.768 us, total = 640.487 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 7.528 us, total = 22.583 us, Queueing time: mean = 52.348 us, max = 87.645 us, min = 69.398 us, total = 157.043 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 22:30:50,446 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:30:50,590 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{node:192.168.0.2: 10000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, memory: 846480855040000, GPU: 20000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 236517 total (35 active) [state-dump] Queueing time: mean = 188.176 ms, max = 1921.160 s, min = -0.001 s, total = 44506.759 s [state-dump] Execution time: mean = 10.335 ms, total = 2444.432 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 56697 total (0 active), Execution time: mean = 36.758 us, total = 2.084 s, Queueing time: mean = 106.616 us, max = 3.225 ms, min = 1.438 us, total = 6.045 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 56697 total (0 active), Execution time: mean = 523.962 us, total = 29.707 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 26974 total (1 active), Execution time: mean = 3.181 us, total = 85.797 ms, Queueing time: mean = 99.630 us, max = 51.449 ms, min = 3.386 us, total = 2.687 s [state-dump] RaySyncer.OnDemandBroadcasting - 26974 total (1 active), Execution time: mean = 11.654 us, total = 314.357 ms, Queueing time: mean = 92.199 us, max = 51.440 ms, min = 7.347 us, total = 2.487 s [state-dump] ObjectManager.UpdateAvailableMemory - 26973 total (0 active), Execution time: mean = 5.942 us, total = 160.264 ms, Queueing time: mean = 102.804 us, max = 1.031 ms, min = 2.098 us, total = 2.773 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 13495 total (1 active), Execution time: mean = 19.262 us, total = 259.942 ms, Queueing time: mean = 76.328 us, max = 13.722 ms, min = 5.381 us, total = 1.030 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 10778 total (1 active), Execution time: mean = 455.026 us, total = 4.904 s, Queueing time: mean = 74.882 us, max = 1.472 ms, min = -0.001 s, total = 807.083 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2700 total (1 active), Execution time: mean = 9.481 us, total = 25.599 ms, Queueing time: mean = 180.608 us, max = 2.380 ms, min = 160.000 ns, total = 487.641 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2700 total (1 active), Execution time: mean = 15.516 us, total = 41.894 ms, Queueing time: mean = 65.953 us, max = 2.582 ms, min = 6.718 us, total = 178.073 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2700 total (1 active), Execution time: mean = 2.961 us, total = 7.995 ms, Queueing time: mean = 184.832 us, max = 2.379 ms, min = 4.508 us, total = 499.047 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2699 total (0 active), Execution time: mean = 103.087 us, total = 278.232 ms, Queueing time: mean = 109.668 us, max = 1.188 ms, min = 4.918 us, total = 295.995 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2699 total (0 active), Execution time: mean = 626.258 us, total = 1.690 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 901 total (1 active), Execution time: mean = 9.316 us, total = 8.393 ms, Queueing time: mean = 70.875 us, max = 363.446 us, min = 7.807 us, total = 63.859 ms [state-dump] NodeManager.GcsCheckAlive - 540 total (1 active), Execution time: mean = 322.624 us, total = 174.217 ms, Queueing time: mean = 613.877 us, max = 2.320 ms, min = 6.025 us, total = 331.494 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 540 total (0 active), Execution time: mean = 54.490 us, total = 29.425 ms, Queueing time: mean = 103.194 us, max = 307.469 us, min = 11.913 us, total = 55.725 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 540 total (0 active), Execution time: mean = 1.559 ms, total = 841.874 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 540 total (1 active), Execution time: mean = 553.037 us, total = 298.640 ms, Queueing time: mean = 384.153 us, max = 1.903 ms, min = 8.454 us, total = 207.443 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 270 total (1 active), Execution time: mean = 1.803 ms, total = 486.851 ms, Queueing time: mean = 71.416 us, max = 183.426 us, min = 11.269 us, total = 19.282 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 230 total (21 active), Execution time: mean = 8.359 us, total = 1.923 ms, Queueing time: mean = 193.429 s, max = 1921.160 s, min = 23.644 us, total = 44488.667 s [state-dump] ClientConnection.async_read.ProcessMessage - 209 total (0 active), Execution time: mean = 373.917 us, total = 78.149 ms, Queueing time: mean = 20.530 us, max = 494.085 us, min = 2.397 us, total = 4.291 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 73 total (0 active), Execution time: mean = 55.270 ms, total = 4.035 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 73 total (0 active), Execution time: mean = 96.595 us, total = 7.051 ms, Queueing time: mean = 187.306 us, max = 674.029 us, min = 6.921 us, total = 13.673 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 63 total (0 active), Execution time: mean = 106.443 us, total = 6.706 ms, Queueing time: mean = 101.429 us, max = 252.805 us, min = 19.400 us, total = 6.390 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 63 total (0 active), Execution time: mean = 589.417 us, total = 37.133 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 63 total (0 active), Execution time: mean = 36.303 us, total = 2.287 ms, Queueing time: mean = 182.020 us, max = 539.776 us, min = 15.433 us, total = 11.467 ms [state-dump] - 54 total (0 active), Execution time: mean = 933.944 ns, total = 50.433 us, Queueing time: mean = 90.350 us, max = 237.802 us, min = 20.527 us, total = 4.879 ms [state-dump] RaySyncer.BroadcastMessage - 54 total (0 active), Execution time: mean = 218.069 us, total = 11.776 ms, Queueing time: mean = 704.815 ns, max = 1.206 us, min = 91.000 ns, total = 38.060 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 45 total (1 active, 1 running), Execution time: mean = 2.770 ms, total = 124.646 ms, Queueing time: mean = 63.175 us, max = 150.858 us, min = 13.745 us, total = 2.843 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.774 s, total = 2398.641 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 410.500 us, total = 2.053 ms, Queueing time: mean = 128.097 us, max = 238.879 us, min = 52.768 us, total = 640.487 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 8.338 us, total = 33.352 us, Queueing time: mean = 70.909 us, max = 126.591 us, min = 69.398 us, total = 283.634 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:31:50,446 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:31:50,593 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{node:192.168.0.2: 10000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, memory: 846480855040000, GPU: 20000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 241752 total (35 active) [state-dump] Queueing time: mean = 184.103 ms, max = 1921.160 s, min = -0.001 s, total = 44507.167 s [state-dump] Execution time: mean = 10.115 ms, total = 2445.384 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 57957 total (0 active), Execution time: mean = 36.762 us, total = 2.131 s, Queueing time: mean = 106.720 us, max = 3.225 ms, min = 1.438 us, total = 6.185 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 57957 total (0 active), Execution time: mean = 524.370 us, total = 30.391 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 27574 total (1 active), Execution time: mean = 3.180 us, total = 87.674 ms, Queueing time: mean = 99.570 us, max = 51.449 ms, min = 3.386 us, total = 2.746 s [state-dump] RaySyncer.OnDemandBroadcasting - 27574 total (1 active), Execution time: mean = 11.644 us, total = 321.061 ms, Queueing time: mean = 92.150 us, max = 51.440 ms, min = 7.347 us, total = 2.541 s [state-dump] ObjectManager.UpdateAvailableMemory - 27573 total (0 active), Execution time: mean = 5.944 us, total = 163.892 ms, Queueing time: mean = 102.931 us, max = 1.031 ms, min = 2.098 us, total = 2.838 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 13795 total (1 active), Execution time: mean = 19.286 us, total = 266.050 ms, Queueing time: mean = 76.379 us, max = 13.722 ms, min = 5.381 us, total = 1.054 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 11018 total (1 active), Execution time: mean = 455.276 us, total = 5.016 s, Queueing time: mean = 74.895 us, max = 1.472 ms, min = -0.001 s, total = 825.195 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2760 total (1 active), Execution time: mean = 9.470 us, total = 26.139 ms, Queueing time: mean = 180.672 us, max = 2.380 ms, min = 160.000 ns, total = 498.656 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2760 total (1 active), Execution time: mean = 15.513 us, total = 42.816 ms, Queueing time: mean = 66.125 us, max = 2.582 ms, min = 6.718 us, total = 182.505 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2760 total (1 active), Execution time: mean = 2.958 us, total = 8.165 ms, Queueing time: mean = 184.892 us, max = 2.379 ms, min = 4.508 us, total = 510.301 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2759 total (0 active), Execution time: mean = 102.987 us, total = 284.141 ms, Queueing time: mean = 109.776 us, max = 1.188 ms, min = 4.918 us, total = 302.873 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2759 total (0 active), Execution time: mean = 626.332 us, total = 1.728 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 921 total (1 active), Execution time: mean = 9.292 us, total = 8.558 ms, Queueing time: mean = 70.788 us, max = 363.446 us, min = 7.807 us, total = 65.195 ms [state-dump] NodeManager.GcsCheckAlive - 552 total (1 active), Execution time: mean = 322.921 us, total = 178.252 ms, Queueing time: mean = 614.382 us, max = 2.320 ms, min = 6.025 us, total = 339.139 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 552 total (0 active), Execution time: mean = 54.563 us, total = 30.119 ms, Queueing time: mean = 103.387 us, max = 307.469 us, min = 11.913 us, total = 57.070 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 552 total (0 active), Execution time: mean = 1.560 ms, total = 861.234 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 552 total (1 active), Execution time: mean = 553.480 us, total = 305.521 ms, Queueing time: mean = 384.624 us, max = 1.903 ms, min = 8.454 us, total = 212.313 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 276 total (1 active), Execution time: mean = 1.805 ms, total = 498.142 ms, Queueing time: mean = 71.523 us, max = 183.426 us, min = 11.269 us, total = 19.740 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 230 total (21 active), Execution time: mean = 8.359 us, total = 1.923 ms, Queueing time: mean = 193.429 s, max = 1921.160 s, min = 23.644 us, total = 44488.667 s [state-dump] ClientConnection.async_read.ProcessMessage - 209 total (0 active), Execution time: mean = 373.917 us, total = 78.149 ms, Queueing time: mean = 20.530 us, max = 494.085 us, min = 2.397 us, total = 4.291 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 73 total (0 active), Execution time: mean = 55.270 ms, total = 4.035 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 73 total (0 active), Execution time: mean = 96.595 us, total = 7.051 ms, Queueing time: mean = 187.306 us, max = 674.029 us, min = 6.921 us, total = 13.673 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 63 total (0 active), Execution time: mean = 106.443 us, total = 6.706 ms, Queueing time: mean = 101.429 us, max = 252.805 us, min = 19.400 us, total = 6.390 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 63 total (0 active), Execution time: mean = 589.417 us, total = 37.133 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 63 total (0 active), Execution time: mean = 36.303 us, total = 2.287 ms, Queueing time: mean = 182.020 us, max = 539.776 us, min = 15.433 us, total = 11.467 ms [state-dump] - 54 total (0 active), Execution time: mean = 933.944 ns, total = 50.433 us, Queueing time: mean = 90.350 us, max = 237.802 us, min = 20.527 us, total = 4.879 ms [state-dump] RaySyncer.BroadcastMessage - 54 total (0 active), Execution time: mean = 218.069 us, total = 11.776 ms, Queueing time: mean = 704.815 ns, max = 1.206 us, min = 91.000 ns, total = 38.060 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 46 total (1 active, 1 running), Execution time: mean = 2.782 ms, total = 127.953 ms, Queueing time: mean = 63.631 us, max = 150.858 us, min = 13.745 us, total = 2.927 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.774 s, total = 2398.641 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 410.500 us, total = 2.053 ms, Queueing time: mean = 128.097 us, max = 238.879 us, min = 52.768 us, total = 640.487 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 8.338 us, total = 33.352 us, Queueing time: mean = 70.909 us, max = 126.591 us, min = 69.398 us, total = 283.634 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 22:32:50,446 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:32:50,596 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{node:192.168.0.2: 10000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, memory: 846480855040000, GPU: 20000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 246983 total (35 active) [state-dump] Queueing time: mean = 180.205 ms, max = 1921.160 s, min = -0.001 s, total = 44507.584 s [state-dump] Execution time: mean = 9.905 ms, total = 2446.305 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 59217 total (0 active), Execution time: mean = 36.742 us, total = 2.176 s, Queueing time: mean = 106.784 us, max = 3.225 ms, min = 1.438 us, total = 6.323 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 59217 total (0 active), Execution time: mean = 524.373 us, total = 31.052 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 28173 total (1 active), Execution time: mean = 3.179 us, total = 89.557 ms, Queueing time: mean = 99.632 us, max = 51.449 ms, min = 3.386 us, total = 2.807 s [state-dump] RaySyncer.OnDemandBroadcasting - 28173 total (1 active), Execution time: mean = 11.648 us, total = 328.162 ms, Queueing time: mean = 92.207 us, max = 51.440 ms, min = 7.347 us, total = 2.598 s [state-dump] ObjectManager.UpdateAvailableMemory - 28172 total (0 active), Execution time: mean = 5.948 us, total = 167.557 ms, Queueing time: mean = 103.105 us, max = 1.031 ms, min = 2.098 us, total = 2.905 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 14095 total (1 active), Execution time: mean = 19.271 us, total = 271.630 ms, Queueing time: mean = 76.398 us, max = 13.722 ms, min = 5.381 us, total = 1.077 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 11257 total (1 active), Execution time: mean = 455.136 us, total = 5.123 s, Queueing time: mean = 74.883 us, max = 1.472 ms, min = -0.001 s, total = 842.963 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2820 total (1 active), Execution time: mean = 9.501 us, total = 26.794 ms, Queueing time: mean = 181.021 us, max = 2.380 ms, min = 160.000 ns, total = 510.478 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2820 total (1 active), Execution time: mean = 15.542 us, total = 43.829 ms, Queueing time: mean = 66.192 us, max = 2.582 ms, min = 6.718 us, total = 186.661 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2820 total (1 active), Execution time: mean = 2.960 us, total = 8.347 ms, Queueing time: mean = 185.260 us, max = 2.379 ms, min = 4.508 us, total = 522.433 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2819 total (0 active), Execution time: mean = 102.796 us, total = 289.781 ms, Queueing time: mean = 110.109 us, max = 1.188 ms, min = 4.918 us, total = 310.398 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2819 total (0 active), Execution time: mean = 626.174 us, total = 1.765 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 941 total (1 active), Execution time: mean = 9.297 us, total = 8.749 ms, Queueing time: mean = 70.948 us, max = 363.446 us, min = 7.807 us, total = 66.762 ms [state-dump] NodeManager.GcsCheckAlive - 564 total (1 active), Execution time: mean = 323.186 us, total = 182.277 ms, Queueing time: mean = 615.651 us, max = 2.320 ms, min = 6.025 us, total = 347.227 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 564 total (0 active), Execution time: mean = 54.570 us, total = 30.777 ms, Queueing time: mean = 103.577 us, max = 307.469 us, min = 11.913 us, total = 58.417 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 564 total (0 active), Execution time: mean = 1.561 ms, total = 880.497 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 564 total (1 active), Execution time: mean = 553.334 us, total = 312.081 ms, Queueing time: mean = 386.123 us, max = 1.903 ms, min = 8.454 us, total = 217.774 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 282 total (1 active), Execution time: mean = 1.807 ms, total = 509.568 ms, Queueing time: mean = 71.715 us, max = 183.426 us, min = 11.269 us, total = 20.224 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 230 total (21 active), Execution time: mean = 8.359 us, total = 1.923 ms, Queueing time: mean = 193.429 s, max = 1921.160 s, min = 23.644 us, total = 44488.667 s [state-dump] ClientConnection.async_read.ProcessMessage - 209 total (0 active), Execution time: mean = 373.917 us, total = 78.149 ms, Queueing time: mean = 20.530 us, max = 494.085 us, min = 2.397 us, total = 4.291 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 73 total (0 active), Execution time: mean = 55.270 ms, total = 4.035 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 73 total (0 active), Execution time: mean = 96.595 us, total = 7.051 ms, Queueing time: mean = 187.306 us, max = 674.029 us, min = 6.921 us, total = 13.673 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 63 total (0 active), Execution time: mean = 106.443 us, total = 6.706 ms, Queueing time: mean = 101.429 us, max = 252.805 us, min = 19.400 us, total = 6.390 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 63 total (0 active), Execution time: mean = 589.417 us, total = 37.133 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 63 total (0 active), Execution time: mean = 36.303 us, total = 2.287 ms, Queueing time: mean = 182.020 us, max = 539.776 us, min = 15.433 us, total = 11.467 ms [state-dump] - 54 total (0 active), Execution time: mean = 933.944 ns, total = 50.433 us, Queueing time: mean = 90.350 us, max = 237.802 us, min = 20.527 us, total = 4.879 ms [state-dump] RaySyncer.BroadcastMessage - 54 total (0 active), Execution time: mean = 218.069 us, total = 11.776 ms, Queueing time: mean = 704.815 ns, max = 1.206 us, min = 91.000 ns, total = 38.060 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 47 total (1 active, 1 running), Execution time: mean = 2.782 ms, total = 130.742 ms, Queueing time: mean = 63.561 us, max = 150.858 us, min = 13.745 us, total = 2.987 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.774 s, total = 2398.641 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 410.500 us, total = 2.053 ms, Queueing time: mean = 128.097 us, max = 238.879 us, min = 52.768 us, total = 640.487 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 8.338 us, total = 33.352 us, Queueing time: mean = 70.909 us, max = 126.591 us, min = 69.398 us, total = 283.634 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:33:50,446 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:33:50,599 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{node:192.168.0.2: 10000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, memory: 846480855040000, GPU: 20000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 252213 total (35 active) [state-dump] Queueing time: mean = 176.470 ms, max = 1921.160 s, min = -0.001 s, total = 44507.965 s [state-dump] Execution time: mean = 9.703 ms, total = 2447.211 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 60476 total (0 active), Execution time: mean = 36.729 us, total = 2.221 s, Queueing time: mean = 106.785 us, max = 3.225 ms, min = 1.438 us, total = 6.458 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 60476 total (0 active), Execution time: mean = 524.095 us, total = 31.695 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 28772 total (1 active), Execution time: mean = 3.180 us, total = 91.491 ms, Queueing time: mean = 99.574 us, max = 51.449 ms, min = 3.386 us, total = 2.865 s [state-dump] RaySyncer.OnDemandBroadcasting - 28772 total (1 active), Execution time: mean = 11.645 us, total = 335.057 ms, Queueing time: mean = 92.152 us, max = 51.440 ms, min = 7.347 us, total = 2.651 s [state-dump] ObjectManager.UpdateAvailableMemory - 28771 total (0 active), Execution time: mean = 5.946 us, total = 171.074 ms, Queueing time: mean = 102.785 us, max = 1.031 ms, min = 2.098 us, total = 2.957 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 14395 total (1 active), Execution time: mean = 19.259 us, total = 277.237 ms, Queueing time: mean = 76.365 us, max = 13.722 ms, min = 5.381 us, total = 1.099 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 11497 total (1 active), Execution time: mean = 455.486 us, total = 5.237 s, Queueing time: mean = 74.881 us, max = 1.472 ms, min = -0.001 s, total = 860.909 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2880 total (1 active), Execution time: mean = 9.504 us, total = 27.371 ms, Queueing time: mean = 180.358 us, max = 2.380 ms, min = 160.000 ns, total = 519.431 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2880 total (1 active), Execution time: mean = 15.534 us, total = 44.737 ms, Queueing time: mean = 66.460 us, max = 2.582 ms, min = 6.718 us, total = 191.406 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2880 total (1 active), Execution time: mean = 2.961 us, total = 8.526 ms, Queueing time: mean = 184.601 us, max = 2.379 ms, min = 4.508 us, total = 531.650 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2879 total (0 active), Execution time: mean = 102.736 us, total = 295.776 ms, Queueing time: mean = 110.227 us, max = 1.188 ms, min = 4.918 us, total = 317.343 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2879 total (0 active), Execution time: mean = 625.703 us, total = 1.801 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 961 total (1 active), Execution time: mean = 9.303 us, total = 8.940 ms, Queueing time: mean = 71.172 us, max = 363.446 us, min = 7.807 us, total = 68.396 ms [state-dump] NodeManager.GcsCheckAlive - 576 total (1 active), Execution time: mean = 322.937 us, total = 186.012 ms, Queueing time: mean = 612.823 us, max = 2.320 ms, min = 6.025 us, total = 352.986 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 576 total (0 active), Execution time: mean = 54.491 us, total = 31.387 ms, Queueing time: mean = 103.627 us, max = 307.469 us, min = 11.913 us, total = 59.689 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 576 total (0 active), Execution time: mean = 1.560 ms, total = 898.678 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 576 total (1 active), Execution time: mean = 552.495 us, total = 318.237 ms, Queueing time: mean = 384.016 us, max = 1.903 ms, min = 8.454 us, total = 221.193 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 288 total (1 active), Execution time: mean = 1.803 ms, total = 519.162 ms, Queueing time: mean = 71.658 us, max = 183.426 us, min = 11.269 us, total = 20.638 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 230 total (21 active), Execution time: mean = 8.359 us, total = 1.923 ms, Queueing time: mean = 193.429 s, max = 1921.160 s, min = 23.644 us, total = 44488.667 s [state-dump] ClientConnection.async_read.ProcessMessage - 209 total (0 active), Execution time: mean = 373.917 us, total = 78.149 ms, Queueing time: mean = 20.530 us, max = 494.085 us, min = 2.397 us, total = 4.291 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 73 total (0 active), Execution time: mean = 55.270 ms, total = 4.035 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 73 total (0 active), Execution time: mean = 96.595 us, total = 7.051 ms, Queueing time: mean = 187.306 us, max = 674.029 us, min = 6.921 us, total = 13.673 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 63 total (0 active), Execution time: mean = 106.443 us, total = 6.706 ms, Queueing time: mean = 101.429 us, max = 252.805 us, min = 19.400 us, total = 6.390 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 63 total (0 active), Execution time: mean = 589.417 us, total = 37.133 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 63 total (0 active), Execution time: mean = 36.303 us, total = 2.287 ms, Queueing time: mean = 182.020 us, max = 539.776 us, min = 15.433 us, total = 11.467 ms [state-dump] - 54 total (0 active), Execution time: mean = 933.944 ns, total = 50.433 us, Queueing time: mean = 90.350 us, max = 237.802 us, min = 20.527 us, total = 4.879 ms [state-dump] RaySyncer.BroadcastMessage - 54 total (0 active), Execution time: mean = 218.069 us, total = 11.776 ms, Queueing time: mean = 704.815 ns, max = 1.206 us, min = 91.000 ns, total = 38.060 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 48 total (1 active, 1 running), Execution time: mean = 2.788 ms, total = 133.843 ms, Queueing time: mean = 64.015 us, max = 150.858 us, min = 13.745 us, total = 3.073 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.774 s, total = 2398.641 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 410.500 us, total = 2.053 ms, Queueing time: mean = 128.097 us, max = 238.879 us, min = 52.768 us, total = 640.487 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 8.338 us, total = 33.352 us, Queueing time: mean = 70.909 us, max = 126.591 us, min = 69.398 us, total = 283.634 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 22:34:50,447 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:34:50,602 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{node:192.168.0.2: 10000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, memory: 846480855040000, GPU: 20000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 257446 total (35 active) [state-dump] Queueing time: mean = 172.884 ms, max = 1921.160 s, min = -0.001 s, total = 44508.329 s [state-dump] Execution time: mean = 9.509 ms, total = 2448.053 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 61736 total (0 active), Execution time: mean = 36.632 us, total = 2.262 s, Queueing time: mean = 106.668 us, max = 3.225 ms, min = 1.438 us, total = 6.585 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 61736 total (0 active), Execution time: mean = 523.118 us, total = 32.295 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 29372 total (1 active), Execution time: mean = 3.177 us, total = 93.326 ms, Queueing time: mean = 99.328 us, max = 51.449 ms, min = 3.386 us, total = 2.917 s [state-dump] RaySyncer.OnDemandBroadcasting - 29372 total (1 active), Execution time: mean = 11.615 us, total = 341.162 ms, Queueing time: mean = 91.933 us, max = 51.440 ms, min = 7.347 us, total = 2.700 s [state-dump] ObjectManager.UpdateAvailableMemory - 29371 total (0 active), Execution time: mean = 5.930 us, total = 174.171 ms, Queueing time: mean = 102.432 us, max = 1.031 ms, min = 2.098 us, total = 3.009 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 14694 total (1 active), Execution time: mean = 19.222 us, total = 282.450 ms, Queueing time: mean = 76.264 us, max = 13.722 ms, min = 5.381 us, total = 1.121 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 11736 total (1 active), Execution time: mean = 455.068 us, total = 5.341 s, Queueing time: mean = 74.746 us, max = 1.472 ms, min = -0.001 s, total = 877.215 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2940 total (1 active), Execution time: mean = 9.480 us, total = 27.870 ms, Queueing time: mean = 180.200 us, max = 2.380 ms, min = 160.000 ns, total = 529.788 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2940 total (1 active), Execution time: mean = 15.499 us, total = 45.566 ms, Queueing time: mean = 66.539 us, max = 2.582 ms, min = 6.718 us, total = 195.626 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2940 total (1 active), Execution time: mean = 2.958 us, total = 8.698 ms, Queueing time: mean = 184.430 us, max = 2.379 ms, min = 4.508 us, total = 542.224 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2939 total (0 active), Execution time: mean = 102.495 us, total = 301.233 ms, Queueing time: mean = 110.119 us, max = 1.188 ms, min = 4.918 us, total = 323.641 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2939 total (0 active), Execution time: mean = 624.460 us, total = 1.835 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 981 total (1 active), Execution time: mean = 9.261 us, total = 9.085 ms, Queueing time: mean = 71.084 us, max = 363.446 us, min = 7.807 us, total = 69.733 ms [state-dump] NodeManager.GcsCheckAlive - 588 total (1 active), Execution time: mean = 322.670 us, total = 189.730 ms, Queueing time: mean = 612.565 us, max = 2.320 ms, min = 6.025 us, total = 360.188 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 588 total (0 active), Execution time: mean = 54.460 us, total = 32.022 ms, Queueing time: mean = 103.393 us, max = 307.469 us, min = 11.913 us, total = 60.795 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 588 total (0 active), Execution time: mean = 1.558 ms, total = 916.263 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 588 total (1 active), Execution time: mean = 551.438 us, total = 324.246 ms, Queueing time: mean = 384.009 us, max = 1.903 ms, min = 8.454 us, total = 225.798 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 294 total (1 active), Execution time: mean = 1.800 ms, total = 529.122 ms, Queueing time: mean = 71.315 us, max = 183.426 us, min = 11.269 us, total = 20.966 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 230 total (21 active), Execution time: mean = 8.359 us, total = 1.923 ms, Queueing time: mean = 193.429 s, max = 1921.160 s, min = 23.644 us, total = 44488.667 s [state-dump] ClientConnection.async_read.ProcessMessage - 209 total (0 active), Execution time: mean = 373.917 us, total = 78.149 ms, Queueing time: mean = 20.530 us, max = 494.085 us, min = 2.397 us, total = 4.291 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 73 total (0 active), Execution time: mean = 55.270 ms, total = 4.035 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 73 total (0 active), Execution time: mean = 96.595 us, total = 7.051 ms, Queueing time: mean = 187.306 us, max = 674.029 us, min = 6.921 us, total = 13.673 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 63 total (0 active), Execution time: mean = 106.443 us, total = 6.706 ms, Queueing time: mean = 101.429 us, max = 252.805 us, min = 19.400 us, total = 6.390 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 63 total (0 active), Execution time: mean = 589.417 us, total = 37.133 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 63 total (0 active), Execution time: mean = 36.303 us, total = 2.287 ms, Queueing time: mean = 182.020 us, max = 539.776 us, min = 15.433 us, total = 11.467 ms [state-dump] - 54 total (0 active), Execution time: mean = 933.944 ns, total = 50.433 us, Queueing time: mean = 90.350 us, max = 237.802 us, min = 20.527 us, total = 4.879 ms [state-dump] RaySyncer.BroadcastMessage - 54 total (0 active), Execution time: mean = 218.069 us, total = 11.776 ms, Queueing time: mean = 704.815 ns, max = 1.206 us, min = 91.000 ns, total = 38.060 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 49 total (1 active, 1 running), Execution time: mean = 2.792 ms, total = 136.827 ms, Queueing time: mean = 64.141 us, max = 150.858 us, min = 13.745 us, total = 3.143 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.774 s, total = 2398.641 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 410.500 us, total = 2.053 ms, Queueing time: mean = 128.097 us, max = 238.879 us, min = 52.768 us, total = 640.487 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 8.338 us, total = 33.352 us, Queueing time: mean = 70.909 us, max = 126.591 us, min = 69.398 us, total = 283.634 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 22:35:50,447 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:35:50,605 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{accelerator_type:A40: 10000, node:192.168.0.2: 10000, GPU: 20000, memory: 846480855040000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, CPU: 200000, node:__internal_head__: 10000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 262774 total (35 active) [state-dump] Queueing time: mean = 196.545 ms, max = 1921.160 s, min = -0.001 s, total = 51647.037 s [state-dump] Execution time: mean = 11.603 ms, total = 3048.917 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 62996 total (0 active), Execution time: mean = 36.576 us, total = 2.304 s, Queueing time: mean = 106.416 us, max = 3.225 ms, min = 1.438 us, total = 6.704 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 62996 total (0 active), Execution time: mean = 522.160 us, total = 32.894 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 29971 total (1 active), Execution time: mean = 3.172 us, total = 95.079 ms, Queueing time: mean = 99.047 us, max = 51.449 ms, min = 3.386 us, total = 2.969 s [state-dump] RaySyncer.OnDemandBroadcasting - 29971 total (1 active), Execution time: mean = 11.674 us, total = 349.877 ms, Queueing time: mean = 91.587 us, max = 51.440 ms, min = 7.347 us, total = 2.745 s [state-dump] ObjectManager.UpdateAvailableMemory - 29970 total (0 active), Execution time: mean = 5.910 us, total = 177.127 ms, Queueing time: mean = 102.227 us, max = 1.031 ms, min = 2.098 us, total = 3.064 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 14994 total (1 active), Execution time: mean = 19.154 us, total = 287.195 ms, Queueing time: mean = 76.157 us, max = 13.722 ms, min = 5.381 us, total = 1.142 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 11976 total (1 active), Execution time: mean = 454.550 us, total = 5.444 s, Queueing time: mean = 74.579 us, max = 1.472 ms, min = -0.001 s, total = 893.156 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3000 total (1 active), Execution time: mean = 9.462 us, total = 28.386 ms, Queueing time: mean = 180.205 us, max = 2.380 ms, min = 160.000 ns, total = 540.615 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3000 total (1 active), Execution time: mean = 15.449 us, total = 46.348 ms, Queueing time: mean = 66.357 us, max = 2.582 ms, min = 6.718 us, total = 199.070 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3000 total (1 active), Execution time: mean = 2.955 us, total = 8.866 ms, Queueing time: mean = 184.426 us, max = 2.379 ms, min = 4.508 us, total = 553.279 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2999 total (0 active), Execution time: mean = 102.360 us, total = 306.979 ms, Queueing time: mean = 109.849 us, max = 1.188 ms, min = 4.918 us, total = 329.436 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2999 total (0 active), Execution time: mean = 622.849 us, total = 1.868 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1001 total (1 active), Execution time: mean = 9.250 us, total = 9.259 ms, Queueing time: mean = 71.054 us, max = 363.446 us, min = 7.807 us, total = 71.125 ms [state-dump] NodeManager.GcsCheckAlive - 600 total (1 active), Execution time: mean = 321.896 us, total = 193.137 ms, Queueing time: mean = 613.393 us, max = 2.320 ms, min = 6.025 us, total = 368.036 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 600 total (0 active), Execution time: mean = 54.314 us, total = 32.589 ms, Queueing time: mean = 103.499 us, max = 307.469 us, min = 11.913 us, total = 62.100 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 600 total (0 active), Execution time: mean = 1.555 ms, total = 932.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 600 total (1 active), Execution time: mean = 550.212 us, total = 330.127 ms, Queueing time: mean = 385.771 us, max = 1.903 ms, min = 8.454 us, total = 231.463 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 300 total (1 active), Execution time: mean = 1.801 ms, total = 540.370 ms, Queueing time: mean = 71.261 us, max = 183.426 us, min = 11.269 us, total = 21.378 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 241 total (21 active), Execution time: mean = 8.391 us, total = 2.022 ms, Queueing time: mean = 214.220 s, max = 1921.160 s, min = 23.644 us, total = 51627.016 s [state-dump] ClientConnection.async_read.ProcessMessage - 220 total (0 active), Execution time: mean = 355.794 us, total = 78.275 ms, Queueing time: mean = 20.354 us, max = 494.085 us, min = 2.397 us, total = 4.478 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 83 total (0 active), Execution time: mean = 48.706 ms, total = 4.043 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 83 total (0 active), Execution time: mean = 100.509 us, total = 8.342 ms, Queueing time: mean = 180.288 us, max = 674.029 us, min = 6.921 us, total = 14.964 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 73 total (0 active), Execution time: mean = 105.669 us, total = 7.714 ms, Queueing time: mean = 101.687 us, max = 252.805 us, min = 19.400 us, total = 7.423 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 73 total (0 active), Execution time: mean = 589.105 us, total = 43.005 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 73 total (0 active), Execution time: mean = 36.946 us, total = 2.697 ms, Queueing time: mean = 164.270 us, max = 539.776 us, min = 15.433 us, total = 11.992 ms [state-dump] - 65 total (0 active), Execution time: mean = 913.708 ns, total = 59.391 us, Queueing time: mean = 98.104 us, max = 237.802 us, min = 20.527 us, total = 6.377 ms [state-dump] RaySyncer.BroadcastMessage - 65 total (0 active), Execution time: mean = 214.865 us, total = 13.966 ms, Queueing time: mean = 691.308 ns, max = 1.206 us, min = 91.000 ns, total = 44.935 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 50 total (1 active, 1 running), Execution time: mean = 2.791 ms, total = 139.567 ms, Queueing time: mean = 65.155 us, max = 150.858 us, min = 13.745 us, total = 3.258 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.377 s, total = 2998.642 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 410.773 us, total = 2.465 ms, Queueing time: mean = 111.594 us, max = 238.879 us, min = 29.074 us, total = 669.561 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 8.338 us, total = 33.352 us, Queueing time: mean = 70.909 us, max = 126.591 us, min = 69.398 us, total = 283.634 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:36:50,447 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:36:50,608 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{accelerator_type:A40: 10000, node:192.168.0.2: 10000, GPU: 20000, memory: 846480855040000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, CPU: 200000, node:__internal_head__: 10000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 268006 total (35 active) [state-dump] Queueing time: mean = 192.710 ms, max = 1921.160 s, min = -0.001 s, total = 51647.457 s [state-dump] Execution time: mean = 11.380 ms, total = 3049.863 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 64255 total (0 active), Execution time: mean = 36.614 us, total = 2.353 s, Queueing time: mean = 106.533 us, max = 3.225 ms, min = 1.438 us, total = 6.845 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 64255 total (0 active), Execution time: mean = 522.445 us, total = 33.570 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 30571 total (1 active), Execution time: mean = 3.173 us, total = 97.007 ms, Queueing time: mean = 99.084 us, max = 51.449 ms, min = 3.386 us, total = 3.029 s [state-dump] RaySyncer.OnDemandBroadcasting - 30571 total (1 active), Execution time: mean = 11.671 us, total = 356.783 ms, Queueing time: mean = 91.628 us, max = 51.440 ms, min = 7.347 us, total = 2.801 s [state-dump] ObjectManager.UpdateAvailableMemory - 30570 total (0 active), Execution time: mean = 5.919 us, total = 180.931 ms, Queueing time: mean = 102.450 us, max = 1.031 ms, min = 2.098 us, total = 3.132 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 15294 total (1 active), Execution time: mean = 19.152 us, total = 292.903 ms, Queueing time: mean = 76.140 us, max = 13.722 ms, min = 5.381 us, total = 1.164 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 12215 total (1 active), Execution time: mean = 454.913 us, total = 5.557 s, Queueing time: mean = 74.577 us, max = 1.472 ms, min = -0.001 s, total = 910.957 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3060 total (1 active), Execution time: mean = 9.491 us, total = 29.044 ms, Queueing time: mean = 180.683 us, max = 2.380 ms, min = 160.000 ns, total = 552.890 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3060 total (1 active), Execution time: mean = 15.458 us, total = 47.303 ms, Queueing time: mean = 66.482 us, max = 2.582 ms, min = 6.718 us, total = 203.436 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3060 total (1 active), Execution time: mean = 2.958 us, total = 9.052 ms, Queueing time: mean = 184.922 us, max = 2.379 ms, min = 4.508 us, total = 565.863 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3059 total (0 active), Execution time: mean = 102.225 us, total = 312.707 ms, Queueing time: mean = 110.095 us, max = 1.188 ms, min = 4.918 us, total = 336.781 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3059 total (0 active), Execution time: mean = 622.437 us, total = 1.904 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1021 total (1 active), Execution time: mean = 9.253 us, total = 9.447 ms, Queueing time: mean = 71.136 us, max = 363.446 us, min = 7.807 us, total = 72.630 ms [state-dump] NodeManager.GcsCheckAlive - 612 total (1 active), Execution time: mean = 322.450 us, total = 197.339 ms, Queueing time: mean = 614.986 us, max = 2.320 ms, min = 6.025 us, total = 376.372 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 612 total (0 active), Execution time: mean = 54.498 us, total = 33.353 ms, Queueing time: mean = 103.934 us, max = 307.469 us, min = 11.913 us, total = 63.607 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 612 total (0 active), Execution time: mean = 1.557 ms, total = 953.014 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 612 total (1 active), Execution time: mean = 552.100 us, total = 337.885 ms, Queueing time: mean = 385.737 us, max = 1.903 ms, min = 8.454 us, total = 236.071 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 306 total (1 active), Execution time: mean = 1.803 ms, total = 551.701 ms, Queueing time: mean = 71.463 us, max = 183.426 us, min = 11.269 us, total = 21.868 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 241 total (21 active), Execution time: mean = 8.391 us, total = 2.022 ms, Queueing time: mean = 214.220 s, max = 1921.160 s, min = 23.644 us, total = 51627.016 s [state-dump] ClientConnection.async_read.ProcessMessage - 220 total (0 active), Execution time: mean = 355.794 us, total = 78.275 ms, Queueing time: mean = 20.354 us, max = 494.085 us, min = 2.397 us, total = 4.478 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 83 total (0 active), Execution time: mean = 48.706 ms, total = 4.043 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 83 total (0 active), Execution time: mean = 100.509 us, total = 8.342 ms, Queueing time: mean = 180.288 us, max = 674.029 us, min = 6.921 us, total = 14.964 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 73 total (0 active), Execution time: mean = 105.669 us, total = 7.714 ms, Queueing time: mean = 101.687 us, max = 252.805 us, min = 19.400 us, total = 7.423 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 73 total (0 active), Execution time: mean = 589.105 us, total = 43.005 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 73 total (0 active), Execution time: mean = 36.946 us, total = 2.697 ms, Queueing time: mean = 164.270 us, max = 539.776 us, min = 15.433 us, total = 11.992 ms [state-dump] - 65 total (0 active), Execution time: mean = 913.708 ns, total = 59.391 us, Queueing time: mean = 98.104 us, max = 237.802 us, min = 20.527 us, total = 6.377 ms [state-dump] RaySyncer.BroadcastMessage - 65 total (0 active), Execution time: mean = 214.865 us, total = 13.966 ms, Queueing time: mean = 691.308 ns, max = 1.206 us, min = 91.000 ns, total = 44.935 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 51 total (1 active, 1 running), Execution time: mean = 2.796 ms, total = 142.576 ms, Queueing time: mean = 65.160 us, max = 150.858 us, min = 13.745 us, total = 3.323 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.377 s, total = 2998.642 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 410.773 us, total = 2.465 ms, Queueing time: mean = 111.594 us, max = 238.879 us, min = 29.074 us, total = 669.561 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 8.338 us, total = 33.352 us, Queueing time: mean = 70.909 us, max = 126.591 us, min = 69.398 us, total = 283.634 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 22:37:50,448 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:37:50,612 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{accelerator_type:A40: 10000, node:192.168.0.2: 10000, GPU: 20000, memory: 846480855040000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, CPU: 200000, node:__internal_head__: 10000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 273238 total (35 active) [state-dump] Queueing time: mean = 189.022 ms, max = 1921.160 s, min = -0.001 s, total = 51647.898 s [state-dump] Execution time: mean = 11.166 ms, total = 3050.842 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 65515 total (0 active), Execution time: mean = 36.655 us, total = 2.401 s, Queueing time: mean = 106.835 us, max = 3.225 ms, min = 1.438 us, total = 6.999 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 65515 total (0 active), Execution time: mean = 523.187 us, total = 34.277 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 31170 total (1 active), Execution time: mean = 3.175 us, total = 98.968 ms, Queueing time: mean = 99.206 us, max = 51.449 ms, min = 3.386 us, total = 3.092 s [state-dump] RaySyncer.OnDemandBroadcasting - 31170 total (1 active), Execution time: mean = 11.678 us, total = 364.012 ms, Queueing time: mean = 91.746 us, max = 51.440 ms, min = 7.347 us, total = 2.860 s [state-dump] ObjectManager.UpdateAvailableMemory - 31169 total (0 active), Execution time: mean = 5.934 us, total = 184.953 ms, Queueing time: mean = 102.786 us, max = 1.031 ms, min = 2.098 us, total = 3.204 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 15594 total (1 active), Execution time: mean = 19.194 us, total = 299.305 ms, Queueing time: mean = 76.243 us, max = 13.722 ms, min = 5.381 us, total = 1.189 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 12455 total (1 active), Execution time: mean = 455.146 us, total = 5.669 s, Queueing time: mean = 74.700 us, max = 1.472 ms, min = -0.001 s, total = 930.394 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3120 total (1 active), Execution time: mean = 9.509 us, total = 29.668 ms, Queueing time: mean = 180.728 us, max = 2.380 ms, min = 160.000 ns, total = 563.872 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3120 total (1 active), Execution time: mean = 15.453 us, total = 48.214 ms, Queueing time: mean = 66.611 us, max = 2.582 ms, min = 6.718 us, total = 207.828 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3120 total (1 active), Execution time: mean = 2.959 us, total = 9.232 ms, Queueing time: mean = 184.975 us, max = 2.379 ms, min = 4.508 us, total = 577.121 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3119 total (0 active), Execution time: mean = 102.161 us, total = 318.641 ms, Queueing time: mean = 110.354 us, max = 1.188 ms, min = 4.918 us, total = 344.195 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3119 total (0 active), Execution time: mean = 622.705 us, total = 1.942 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1041 total (1 active), Execution time: mean = 9.255 us, total = 9.634 ms, Queueing time: mean = 71.337 us, max = 363.446 us, min = 7.807 us, total = 74.262 ms [state-dump] NodeManager.GcsCheckAlive - 624 total (1 active), Execution time: mean = 322.653 us, total = 201.335 ms, Queueing time: mean = 615.246 us, max = 2.320 ms, min = 6.025 us, total = 383.913 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 624 total (0 active), Execution time: mean = 54.690 us, total = 34.126 ms, Queueing time: mean = 104.134 us, max = 307.469 us, min = 11.913 us, total = 64.980 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 624 total (0 active), Execution time: mean = 1.558 ms, total = 972.397 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 624 total (1 active), Execution time: mean = 552.159 us, total = 344.547 ms, Queueing time: mean = 386.216 us, max = 1.903 ms, min = 8.454 us, total = 240.998 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 312 total (1 active), Execution time: mean = 1.805 ms, total = 563.022 ms, Queueing time: mean = 72.250 us, max = 183.426 us, min = 11.269 us, total = 22.542 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 241 total (21 active), Execution time: mean = 8.391 us, total = 2.022 ms, Queueing time: mean = 214.220 s, max = 1921.160 s, min = 23.644 us, total = 51627.016 s [state-dump] ClientConnection.async_read.ProcessMessage - 220 total (0 active), Execution time: mean = 355.794 us, total = 78.275 ms, Queueing time: mean = 20.354 us, max = 494.085 us, min = 2.397 us, total = 4.478 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 83 total (0 active), Execution time: mean = 48.706 ms, total = 4.043 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 83 total (0 active), Execution time: mean = 100.509 us, total = 8.342 ms, Queueing time: mean = 180.288 us, max = 674.029 us, min = 6.921 us, total = 14.964 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 73 total (0 active), Execution time: mean = 105.669 us, total = 7.714 ms, Queueing time: mean = 101.687 us, max = 252.805 us, min = 19.400 us, total = 7.423 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 73 total (0 active), Execution time: mean = 589.105 us, total = 43.005 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 73 total (0 active), Execution time: mean = 36.946 us, total = 2.697 ms, Queueing time: mean = 164.270 us, max = 539.776 us, min = 15.433 us, total = 11.992 ms [state-dump] - 65 total (0 active), Execution time: mean = 913.708 ns, total = 59.391 us, Queueing time: mean = 98.104 us, max = 237.802 us, min = 20.527 us, total = 6.377 ms [state-dump] RaySyncer.BroadcastMessage - 65 total (0 active), Execution time: mean = 214.865 us, total = 13.966 ms, Queueing time: mean = 691.308 ns, max = 1.206 us, min = 91.000 ns, total = 44.935 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 52 total (1 active, 1 running), Execution time: mean = 2.802 ms, total = 145.686 ms, Queueing time: mean = 65.286 us, max = 150.858 us, min = 13.745 us, total = 3.395 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.377 s, total = 2998.642 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 410.773 us, total = 2.465 ms, Queueing time: mean = 111.594 us, max = 238.879 us, min = 29.074 us, total = 669.561 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 8.338 us, total = 33.352 us, Queueing time: mean = 70.909 us, max = 126.591 us, min = 69.398 us, total = 283.634 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 22:38:50,448 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:38:50,614 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{accelerator_type:A40: 10000, node:192.168.0.2: 10000, GPU: 20000, memory: 846480855040000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, CPU: 200000, node:__internal_head__: 10000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 278472 total (35 active) [state-dump] Queueing time: mean = 185.471 ms, max = 1921.160 s, min = -0.001 s, total = 51648.350 s [state-dump] Execution time: mean = 10.959 ms, total = 3051.836 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 66775 total (0 active), Execution time: mean = 36.711 us, total = 2.451 s, Queueing time: mean = 107.047 us, max = 3.225 ms, min = 1.438 us, total = 7.148 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 66775 total (0 active), Execution time: mean = 524.032 us, total = 34.992 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 31770 total (1 active), Execution time: mean = 3.180 us, total = 101.039 ms, Queueing time: mean = 99.505 us, max = 51.449 ms, min = 3.386 us, total = 3.161 s [state-dump] RaySyncer.OnDemandBroadcasting - 31770 total (1 active), Execution time: mean = 11.701 us, total = 371.742 ms, Queueing time: mean = 92.027 us, max = 51.440 ms, min = 7.347 us, total = 2.924 s [state-dump] ObjectManager.UpdateAvailableMemory - 31769 total (0 active), Execution time: mean = 5.952 us, total = 189.081 ms, Queueing time: mean = 103.078 us, max = 1.031 ms, min = 2.098 us, total = 3.275 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 15894 total (1 active), Execution time: mean = 19.236 us, total = 305.734 ms, Queueing time: mean = 76.404 us, max = 13.722 ms, min = 5.381 us, total = 1.214 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 12694 total (1 active), Execution time: mean = 455.549 us, total = 5.783 s, Queueing time: mean = 74.798 us, max = 1.472 ms, min = -0.001 s, total = 949.486 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3180 total (1 active), Execution time: mean = 9.532 us, total = 30.310 ms, Queueing time: mean = 181.177 us, max = 2.380 ms, min = 160.000 ns, total = 576.143 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3180 total (1 active), Execution time: mean = 15.445 us, total = 49.114 ms, Queueing time: mean = 66.637 us, max = 2.582 ms, min = 6.718 us, total = 211.905 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3180 total (1 active), Execution time: mean = 2.966 us, total = 9.430 ms, Queueing time: mean = 185.434 us, max = 2.379 ms, min = 4.508 us, total = 589.679 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3179 total (0 active), Execution time: mean = 102.189 us, total = 324.859 ms, Queueing time: mean = 110.658 us, max = 1.188 ms, min = 4.918 us, total = 351.782 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3179 total (0 active), Execution time: mean = 623.298 us, total = 1.981 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1061 total (1 active), Execution time: mean = 9.255 us, total = 9.820 ms, Queueing time: mean = 71.672 us, max = 363.446 us, min = 7.807 us, total = 76.044 ms [state-dump] NodeManager.GcsCheckAlive - 636 total (1 active), Execution time: mean = 323.253 us, total = 205.589 ms, Queueing time: mean = 616.694 us, max = 2.320 ms, min = 6.025 us, total = 392.218 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 636 total (0 active), Execution time: mean = 54.864 us, total = 34.894 ms, Queueing time: mean = 104.454 us, max = 307.469 us, min = 11.913 us, total = 66.433 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 636 total (0 active), Execution time: mean = 1.560 ms, total = 992.358 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 636 total (1 active), Execution time: mean = 552.900 us, total = 351.644 ms, Queueing time: mean = 387.513 us, max = 1.903 ms, min = 8.454 us, total = 246.458 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 318 total (1 active), Execution time: mean = 1.808 ms, total = 574.897 ms, Queueing time: mean = 72.343 us, max = 183.426 us, min = 11.269 us, total = 23.005 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 241 total (21 active), Execution time: mean = 8.391 us, total = 2.022 ms, Queueing time: mean = 214.220 s, max = 1921.160 s, min = 23.644 us, total = 51627.016 s [state-dump] ClientConnection.async_read.ProcessMessage - 220 total (0 active), Execution time: mean = 355.794 us, total = 78.275 ms, Queueing time: mean = 20.354 us, max = 494.085 us, min = 2.397 us, total = 4.478 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 83 total (0 active), Execution time: mean = 48.706 ms, total = 4.043 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 83 total (0 active), Execution time: mean = 100.509 us, total = 8.342 ms, Queueing time: mean = 180.288 us, max = 674.029 us, min = 6.921 us, total = 14.964 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 73 total (0 active), Execution time: mean = 105.669 us, total = 7.714 ms, Queueing time: mean = 101.687 us, max = 252.805 us, min = 19.400 us, total = 7.423 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 73 total (0 active), Execution time: mean = 589.105 us, total = 43.005 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 73 total (0 active), Execution time: mean = 36.946 us, total = 2.697 ms, Queueing time: mean = 164.270 us, max = 539.776 us, min = 15.433 us, total = 11.992 ms [state-dump] - 65 total (0 active), Execution time: mean = 913.708 ns, total = 59.391 us, Queueing time: mean = 98.104 us, max = 237.802 us, min = 20.527 us, total = 6.377 ms [state-dump] RaySyncer.BroadcastMessage - 65 total (0 active), Execution time: mean = 214.865 us, total = 13.966 ms, Queueing time: mean = 691.308 ns, max = 1.206 us, min = 91.000 ns, total = 44.935 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 53 total (1 active, 1 running), Execution time: mean = 2.807 ms, total = 148.782 ms, Queueing time: mean = 65.706 us, max = 150.858 us, min = 13.745 us, total = 3.482 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.377 s, total = 2998.642 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 410.773 us, total = 2.465 ms, Queueing time: mean = 111.594 us, max = 238.879 us, min = 29.074 us, total = 669.561 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 8.338 us, total = 33.352 us, Queueing time: mean = 70.909 us, max = 126.591 us, min = 69.398 us, total = 283.634 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:39:50,448 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:39:50,617 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{accelerator_type:A40: 10000, node:192.168.0.2: 10000, GPU: 20000, memory: 846480855040000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, CPU: 200000, node:__internal_head__: 10000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 283702 total (35 active) [state-dump] Queueing time: mean = 182.053 ms, max = 1921.160 s, min = -0.001 s, total = 51648.784 s [state-dump] Execution time: mean = 10.761 ms, total = 3052.820 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 68034 total (0 active), Execution time: mean = 36.734 us, total = 2.499 s, Queueing time: mean = 107.292 us, max = 3.225 ms, min = 1.438 us, total = 7.300 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 68034 total (0 active), Execution time: mean = 524.779 us, total = 35.703 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 32369 total (1 active), Execution time: mean = 3.182 us, total = 102.992 ms, Queueing time: mean = 99.539 us, max = 51.449 ms, min = 3.386 us, total = 3.222 s [state-dump] RaySyncer.OnDemandBroadcasting - 32369 total (1 active), Execution time: mean = 11.703 us, total = 378.803 ms, Queueing time: mean = 92.063 us, max = 51.440 ms, min = 7.347 us, total = 2.980 s [state-dump] ObjectManager.UpdateAvailableMemory - 32368 total (0 active), Execution time: mean = 5.968 us, total = 193.186 ms, Queueing time: mean = 103.400 us, max = 1.031 ms, min = 2.098 us, total = 3.347 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 16194 total (1 active), Execution time: mean = 19.251 us, total = 311.747 ms, Queueing time: mean = 76.481 us, max = 13.722 ms, min = 5.381 us, total = 1.239 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 12934 total (1 active), Execution time: mean = 455.944 us, total = 5.897 s, Queueing time: mean = 74.953 us, max = 1.472 ms, min = -0.001 s, total = 969.440 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3240 total (1 active), Execution time: mean = 9.548 us, total = 30.934 ms, Queueing time: mean = 181.193 us, max = 2.380 ms, min = 160.000 ns, total = 587.064 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3240 total (1 active), Execution time: mean = 15.437 us, total = 50.017 ms, Queueing time: mean = 66.769 us, max = 2.582 ms, min = 6.718 us, total = 216.332 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3240 total (1 active), Execution time: mean = 2.967 us, total = 9.613 ms, Queueing time: mean = 185.456 us, max = 2.379 ms, min = 4.508 us, total = 600.879 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3239 total (0 active), Execution time: mean = 102.212 us, total = 331.065 ms, Queueing time: mean = 110.916 us, max = 1.188 ms, min = 4.918 us, total = 359.257 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3239 total (0 active), Execution time: mean = 623.832 us, total = 2.021 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1081 total (1 active), Execution time: mean = 9.238 us, total = 9.986 ms, Queueing time: mean = 71.827 us, max = 363.446 us, min = 7.807 us, total = 77.645 ms [state-dump] NodeManager.GcsCheckAlive - 648 total (1 active), Execution time: mean = 323.589 us, total = 209.686 ms, Queueing time: mean = 616.811 us, max = 2.320 ms, min = 6.025 us, total = 399.693 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 648 total (0 active), Execution time: mean = 54.941 us, total = 35.602 ms, Queueing time: mean = 104.651 us, max = 307.469 us, min = 11.913 us, total = 67.814 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 648 total (0 active), Execution time: mean = 1.561 ms, total = 1.012 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 648 total (1 active), Execution time: mean = 552.928 us, total = 358.297 ms, Queueing time: mean = 387.942 us, max = 1.903 ms, min = 8.454 us, total = 251.386 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 324 total (1 active), Execution time: mean = 1.809 ms, total = 586.219 ms, Queueing time: mean = 72.402 us, max = 183.426 us, min = 11.269 us, total = 23.458 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 241 total (21 active), Execution time: mean = 8.391 us, total = 2.022 ms, Queueing time: mean = 214.220 s, max = 1921.160 s, min = 23.644 us, total = 51627.016 s [state-dump] ClientConnection.async_read.ProcessMessage - 220 total (0 active), Execution time: mean = 355.794 us, total = 78.275 ms, Queueing time: mean = 20.354 us, max = 494.085 us, min = 2.397 us, total = 4.478 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 83 total (0 active), Execution time: mean = 48.706 ms, total = 4.043 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 83 total (0 active), Execution time: mean = 100.509 us, total = 8.342 ms, Queueing time: mean = 180.288 us, max = 674.029 us, min = 6.921 us, total = 14.964 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 73 total (0 active), Execution time: mean = 105.669 us, total = 7.714 ms, Queueing time: mean = 101.687 us, max = 252.805 us, min = 19.400 us, total = 7.423 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 73 total (0 active), Execution time: mean = 589.105 us, total = 43.005 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 73 total (0 active), Execution time: mean = 36.946 us, total = 2.697 ms, Queueing time: mean = 164.270 us, max = 539.776 us, min = 15.433 us, total = 11.992 ms [state-dump] - 65 total (0 active), Execution time: mean = 913.708 ns, total = 59.391 us, Queueing time: mean = 98.104 us, max = 237.802 us, min = 20.527 us, total = 6.377 ms [state-dump] RaySyncer.BroadcastMessage - 65 total (0 active), Execution time: mean = 214.865 us, total = 13.966 ms, Queueing time: mean = 691.308 ns, max = 1.206 us, min = 91.000 ns, total = 44.935 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 54 total (1 active, 1 running), Execution time: mean = 2.806 ms, total = 151.539 ms, Queueing time: mean = 65.657 us, max = 150.858 us, min = 13.745 us, total = 3.545 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.377 s, total = 2998.642 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 410.773 us, total = 2.465 ms, Queueing time: mean = 111.594 us, max = 238.879 us, min = 29.074 us, total = 669.561 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 8.338 us, total = 33.352 us, Queueing time: mean = 70.909 us, max = 126.591 us, min = 69.398 us, total = 283.634 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 22:40:50,449 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:40:50,621 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{accelerator_type:A40: 10000, node:192.168.0.2: 10000, GPU: 20000, memory: 846480855040000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, CPU: 200000, node:__internal_head__: 10000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 288933 total (35 active) [state-dump] Queueing time: mean = 178.758 ms, max = 1921.160 s, min = -0.001 s, total = 51649.211 s [state-dump] Execution time: mean = 10.569 ms, total = 3053.773 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 69294 total (0 active), Execution time: mean = 36.763 us, total = 2.547 s, Queueing time: mean = 107.453 us, max = 3.225 ms, min = 1.438 us, total = 7.446 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 69294 total (0 active), Execution time: mean = 525.101 us, total = 36.386 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 32968 total (1 active), Execution time: mean = 3.183 us, total = 104.930 ms, Queueing time: mean = 99.570 us, max = 51.449 ms, min = 3.386 us, total = 3.283 s [state-dump] RaySyncer.OnDemandBroadcasting - 32968 total (1 active), Execution time: mean = 11.697 us, total = 385.642 ms, Queueing time: mean = 92.101 us, max = 51.440 ms, min = 7.347 us, total = 3.036 s [state-dump] ObjectManager.UpdateAvailableMemory - 32967 total (0 active), Execution time: mean = 5.977 us, total = 197.035 ms, Queueing time: mean = 103.634 us, max = 1.031 ms, min = 2.098 us, total = 3.417 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 16494 total (1 active), Execution time: mean = 19.281 us, total = 318.028 ms, Queueing time: mean = 76.557 us, max = 13.722 ms, min = 5.381 us, total = 1.263 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 13173 total (1 active), Execution time: mean = 456.101 us, total = 6.008 s, Queueing time: mean = 74.990 us, max = 1.472 ms, min = -0.001 s, total = 987.845 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3300 total (1 active), Execution time: mean = 9.560 us, total = 31.546 ms, Queueing time: mean = 181.326 us, max = 2.380 ms, min = 160.000 ns, total = 598.376 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3300 total (1 active), Execution time: mean = 15.438 us, total = 50.945 ms, Queueing time: mean = 66.870 us, max = 2.582 ms, min = 6.718 us, total = 220.670 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3300 total (1 active), Execution time: mean = 2.972 us, total = 9.806 ms, Queueing time: mean = 185.592 us, max = 2.379 ms, min = 4.508 us, total = 612.455 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3299 total (0 active), Execution time: mean = 102.281 us, total = 337.426 ms, Queueing time: mean = 111.094 us, max = 1.188 ms, min = 4.918 us, total = 366.499 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3299 total (0 active), Execution time: mean = 624.355 us, total = 2.060 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1101 total (1 active), Execution time: mean = 9.254 us, total = 10.189 ms, Queueing time: mean = 72.040 us, max = 363.446 us, min = 7.807 us, total = 79.316 ms [state-dump] NodeManager.GcsCheckAlive - 660 total (1 active), Execution time: mean = 323.598 us, total = 213.575 ms, Queueing time: mean = 617.464 us, max = 2.320 ms, min = 6.025 us, total = 407.526 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 660 total (0 active), Execution time: mean = 54.953 us, total = 36.269 ms, Queueing time: mean = 104.661 us, max = 307.469 us, min = 11.913 us, total = 69.077 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 660 total (0 active), Execution time: mean = 1.561 ms, total = 1.030 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 660 total (1 active), Execution time: mean = 552.600 us, total = 364.716 ms, Queueing time: mean = 388.900 us, max = 1.903 ms, min = 8.454 us, total = 256.674 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 330 total (1 active), Execution time: mean = 1.811 ms, total = 597.521 ms, Queueing time: mean = 72.279 us, max = 183.426 us, min = 11.269 us, total = 23.852 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 241 total (21 active), Execution time: mean = 8.391 us, total = 2.022 ms, Queueing time: mean = 214.220 s, max = 1921.160 s, min = 23.644 us, total = 51627.016 s [state-dump] ClientConnection.async_read.ProcessMessage - 220 total (0 active), Execution time: mean = 355.794 us, total = 78.275 ms, Queueing time: mean = 20.354 us, max = 494.085 us, min = 2.397 us, total = 4.478 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 83 total (0 active), Execution time: mean = 48.706 ms, total = 4.043 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 83 total (0 active), Execution time: mean = 100.509 us, total = 8.342 ms, Queueing time: mean = 180.288 us, max = 674.029 us, min = 6.921 us, total = 14.964 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 73 total (0 active), Execution time: mean = 105.669 us, total = 7.714 ms, Queueing time: mean = 101.687 us, max = 252.805 us, min = 19.400 us, total = 7.423 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 73 total (0 active), Execution time: mean = 589.105 us, total = 43.005 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 73 total (0 active), Execution time: mean = 36.946 us, total = 2.697 ms, Queueing time: mean = 164.270 us, max = 539.776 us, min = 15.433 us, total = 11.992 ms [state-dump] - 65 total (0 active), Execution time: mean = 913.708 ns, total = 59.391 us, Queueing time: mean = 98.104 us, max = 237.802 us, min = 20.527 us, total = 6.377 ms [state-dump] RaySyncer.BroadcastMessage - 65 total (0 active), Execution time: mean = 214.865 us, total = 13.966 ms, Queueing time: mean = 691.308 ns, max = 1.206 us, min = 91.000 ns, total = 44.935 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 55 total (1 active, 1 running), Execution time: mean = 2.808 ms, total = 154.420 ms, Queueing time: mean = 67.273 us, max = 154.519 us, min = 13.745 us, total = 3.700 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.377 s, total = 2998.642 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 410.773 us, total = 2.465 ms, Queueing time: mean = 111.594 us, max = 238.879 us, min = 29.074 us, total = 669.561 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 8.338 us, total = 33.352 us, Queueing time: mean = 70.909 us, max = 126.591 us, min = 69.398 us, total = 283.634 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 22:41:50,449 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:41:50,624 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{accelerator_type:A40: 10000, node:192.168.0.2: 10000, GPU: 20000, memory: 846480855040000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, CPU: 200000, node:__internal_head__: 10000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 294168 total (35 active) [state-dump] Queueing time: mean = 175.579 ms, max = 1921.160 s, min = -0.001 s, total = 51649.642 s [state-dump] Execution time: mean = 10.384 ms, total = 3054.730 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 70554 total (0 active), Execution time: mean = 36.807 us, total = 2.597 s, Queueing time: mean = 107.565 us, max = 3.225 ms, min = 1.438 us, total = 7.589 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 70554 total (0 active), Execution time: mean = 525.426 us, total = 37.071 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 33568 total (1 active), Execution time: mean = 3.182 us, total = 106.821 ms, Queueing time: mean = 99.716 us, max = 51.449 ms, min = 3.386 us, total = 3.347 s [state-dump] RaySyncer.OnDemandBroadcasting - 33568 total (1 active), Execution time: mean = 11.705 us, total = 392.926 ms, Queueing time: mean = 92.240 us, max = 51.440 ms, min = 7.347 us, total = 3.096 s [state-dump] ObjectManager.UpdateAvailableMemory - 33567 total (0 active), Execution time: mean = 5.986 us, total = 200.918 ms, Queueing time: mean = 103.899 us, max = 1.031 ms, min = 2.098 us, total = 3.488 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 16794 total (1 active), Execution time: mean = 19.281 us, total = 323.798 ms, Queueing time: mean = 76.553 us, max = 13.722 ms, min = 5.381 us, total = 1.286 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 13413 total (1 active), Execution time: mean = 456.175 us, total = 6.119 s, Queueing time: mean = 75.024 us, max = 1.472 ms, min = -0.001 s, total = 1.006 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 3360 total (1 active), Execution time: mean = 9.558 us, total = 32.114 ms, Queueing time: mean = 181.392 us, max = 2.380 ms, min = 160.000 ns, total = 609.477 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3360 total (1 active), Execution time: mean = 15.434 us, total = 51.860 ms, Queueing time: mean = 67.139 us, max = 2.582 ms, min = 6.718 us, total = 225.588 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3360 total (1 active), Execution time: mean = 2.973 us, total = 9.991 ms, Queueing time: mean = 185.654 us, max = 2.379 ms, min = 4.508 us, total = 623.796 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3359 total (0 active), Execution time: mean = 102.330 us, total = 343.725 ms, Queueing time: mean = 111.608 us, max = 1.188 ms, min = 4.918 us, total = 374.892 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3359 total (0 active), Execution time: mean = 625.119 us, total = 2.100 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1121 total (1 active), Execution time: mean = 9.264 us, total = 10.385 ms, Queueing time: mean = 72.497 us, max = 363.446 us, min = 7.807 us, total = 81.269 ms [state-dump] NodeManager.GcsCheckAlive - 672 total (1 active), Execution time: mean = 323.763 us, total = 217.569 ms, Queueing time: mean = 617.582 us, max = 2.320 ms, min = 6.025 us, total = 415.015 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 672 total (0 active), Execution time: mean = 54.998 us, total = 36.959 ms, Queueing time: mean = 104.695 us, max = 307.469 us, min = 11.913 us, total = 70.355 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 672 total (0 active), Execution time: mean = 1.562 ms, total = 1.049 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 672 total (1 active), Execution time: mean = 553.475 us, total = 371.935 ms, Queueing time: mean = 388.340 us, max = 1.903 ms, min = 8.454 us, total = 260.964 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 336 total (1 active), Execution time: mean = 1.811 ms, total = 608.446 ms, Queueing time: mean = 72.310 us, max = 183.426 us, min = 11.269 us, total = 24.296 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 241 total (21 active), Execution time: mean = 8.391 us, total = 2.022 ms, Queueing time: mean = 214.220 s, max = 1921.160 s, min = 23.644 us, total = 51627.016 s [state-dump] ClientConnection.async_read.ProcessMessage - 220 total (0 active), Execution time: mean = 355.794 us, total = 78.275 ms, Queueing time: mean = 20.354 us, max = 494.085 us, min = 2.397 us, total = 4.478 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 83 total (0 active), Execution time: mean = 48.706 ms, total = 4.043 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 83 total (0 active), Execution time: mean = 100.509 us, total = 8.342 ms, Queueing time: mean = 180.288 us, max = 674.029 us, min = 6.921 us, total = 14.964 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 73 total (0 active), Execution time: mean = 105.669 us, total = 7.714 ms, Queueing time: mean = 101.687 us, max = 252.805 us, min = 19.400 us, total = 7.423 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 73 total (0 active), Execution time: mean = 589.105 us, total = 43.005 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 73 total (0 active), Execution time: mean = 36.946 us, total = 2.697 ms, Queueing time: mean = 164.270 us, max = 539.776 us, min = 15.433 us, total = 11.992 ms [state-dump] - 65 total (0 active), Execution time: mean = 913.708 ns, total = 59.391 us, Queueing time: mean = 98.104 us, max = 237.802 us, min = 20.527 us, total = 6.377 ms [state-dump] RaySyncer.BroadcastMessage - 65 total (0 active), Execution time: mean = 214.865 us, total = 13.966 ms, Queueing time: mean = 691.308 ns, max = 1.206 us, min = 91.000 ns, total = 44.935 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 56 total (1 active, 1 running), Execution time: mean = 2.812 ms, total = 157.446 ms, Queueing time: mean = 67.943 us, max = 154.519 us, min = 13.745 us, total = 3.805 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.377 s, total = 2998.642 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 410.773 us, total = 2.465 ms, Queueing time: mean = 111.594 us, max = 238.879 us, min = 29.074 us, total = 669.561 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 8.338 us, total = 33.352 us, Queueing time: mean = 70.909 us, max = 126.591 us, min = 69.398 us, total = 283.634 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:42:50,449 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:42:50,627 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{accelerator_type:A40: 10000, node:192.168.0.2: 10000, GPU: 20000, memory: 846480855040000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, CPU: 200000, node:__internal_head__: 10000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 299398 total (35 active) [state-dump] Queueing time: mean = 172.513 ms, max = 1921.160 s, min = -0.001 s, total = 51650.082 s [state-dump] Execution time: mean = 10.206 ms, total = 3055.721 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 71814 total (0 active), Execution time: mean = 36.878 us, total = 2.648 s, Queueing time: mean = 107.774 us, max = 3.225 ms, min = 1.438 us, total = 7.740 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 71814 total (0 active), Execution time: mean = 526.120 us, total = 37.783 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 34167 total (1 active), Execution time: mean = 3.184 us, total = 108.791 ms, Queueing time: mean = 99.771 us, max = 51.449 ms, min = 3.386 us, total = 3.409 s [state-dump] RaySyncer.OnDemandBroadcasting - 34167 total (1 active), Execution time: mean = 11.705 us, total = 399.931 ms, Queueing time: mean = 92.298 us, max = 51.440 ms, min = 7.347 us, total = 3.154 s [state-dump] ObjectManager.UpdateAvailableMemory - 34166 total (0 active), Execution time: mean = 5.998 us, total = 204.914 ms, Queueing time: mean = 104.161 us, max = 1.031 ms, min = 2.098 us, total = 3.559 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 17093 total (1 active), Execution time: mean = 19.316 us, total = 330.173 ms, Queueing time: mean = 76.692 us, max = 13.722 ms, min = 5.381 us, total = 1.311 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 13652 total (1 active), Execution time: mean = 456.404 us, total = 6.231 s, Queueing time: mean = 75.197 us, max = 1.472 ms, min = -0.001 s, total = 1.027 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 3420 total (1 active), Execution time: mean = 9.573 us, total = 32.739 ms, Queueing time: mean = 181.685 us, max = 2.380 ms, min = 160.000 ns, total = 621.362 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3420 total (1 active), Execution time: mean = 15.445 us, total = 52.823 ms, Queueing time: mean = 67.267 us, max = 2.582 ms, min = 6.718 us, total = 230.055 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3420 total (1 active), Execution time: mean = 2.976 us, total = 10.179 ms, Queueing time: mean = 185.953 us, max = 2.379 ms, min = 4.508 us, total = 635.961 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3419 total (0 active), Execution time: mean = 102.397 us, total = 350.097 ms, Queueing time: mean = 111.932 us, max = 1.188 ms, min = 4.918 us, total = 382.696 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3419 total (0 active), Execution time: mean = 626.178 us, total = 2.141 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1141 total (1 active), Execution time: mean = 9.257 us, total = 10.562 ms, Queueing time: mean = 72.780 us, max = 363.446 us, min = 7.807 us, total = 83.042 ms [state-dump] NodeManager.GcsCheckAlive - 684 total (1 active), Execution time: mean = 324.474 us, total = 221.940 ms, Queueing time: mean = 618.495 us, max = 2.320 ms, min = 6.025 us, total = 423.051 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 684 total (0 active), Execution time: mean = 55.102 us, total = 37.690 ms, Queueing time: mean = 105.090 us, max = 307.469 us, min = 11.913 us, total = 71.882 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 684 total (0 active), Execution time: mean = 1.563 ms, total = 1.069 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 684 total (1 active), Execution time: mean = 554.319 us, total = 379.154 ms, Queueing time: mean = 389.014 us, max = 1.903 ms, min = 8.454 us, total = 266.085 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 342 total (1 active), Execution time: mean = 1.813 ms, total = 620.136 ms, Queueing time: mean = 72.504 us, max = 183.426 us, min = 11.269 us, total = 24.796 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 241 total (21 active), Execution time: mean = 8.391 us, total = 2.022 ms, Queueing time: mean = 214.220 s, max = 1921.160 s, min = 23.644 us, total = 51627.016 s [state-dump] ClientConnection.async_read.ProcessMessage - 220 total (0 active), Execution time: mean = 355.794 us, total = 78.275 ms, Queueing time: mean = 20.354 us, max = 494.085 us, min = 2.397 us, total = 4.478 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 83 total (0 active), Execution time: mean = 48.706 ms, total = 4.043 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 83 total (0 active), Execution time: mean = 100.509 us, total = 8.342 ms, Queueing time: mean = 180.288 us, max = 674.029 us, min = 6.921 us, total = 14.964 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 73 total (0 active), Execution time: mean = 105.669 us, total = 7.714 ms, Queueing time: mean = 101.687 us, max = 252.805 us, min = 19.400 us, total = 7.423 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 73 total (0 active), Execution time: mean = 589.105 us, total = 43.005 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 73 total (0 active), Execution time: mean = 36.946 us, total = 2.697 ms, Queueing time: mean = 164.270 us, max = 539.776 us, min = 15.433 us, total = 11.992 ms [state-dump] - 65 total (0 active), Execution time: mean = 913.708 ns, total = 59.391 us, Queueing time: mean = 98.104 us, max = 237.802 us, min = 20.527 us, total = 6.377 ms [state-dump] RaySyncer.BroadcastMessage - 65 total (0 active), Execution time: mean = 214.865 us, total = 13.966 ms, Queueing time: mean = 691.308 ns, max = 1.206 us, min = 91.000 ns, total = 44.935 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 57 total (1 active, 1 running), Execution time: mean = 2.815 ms, total = 160.477 ms, Queueing time: mean = 68.250 us, max = 154.519 us, min = 13.745 us, total = 3.890 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.377 s, total = 2998.642 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 410.773 us, total = 2.465 ms, Queueing time: mean = 111.594 us, max = 238.879 us, min = 29.074 us, total = 669.561 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 8.338 us, total = 33.352 us, Queueing time: mean = 70.909 us, max = 126.591 us, min = 69.398 us, total = 283.634 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:43:50,450 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:43:50,630 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{accelerator_type:A40: 10000, node:192.168.0.2: 10000, GPU: 20000, memory: 846480855040000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, CPU: 200000, node:__internal_head__: 10000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 304631 total (35 active) [state-dump] Queueing time: mean = 169.551 ms, max = 1921.160 s, min = -0.001 s, total = 51650.490 s [state-dump] Execution time: mean = 10.034 ms, total = 3056.602 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 73073 total (0 active), Execution time: mean = 36.830 us, total = 2.691 s, Queueing time: mean = 107.842 us, max = 3.225 ms, min = 1.438 us, total = 7.880 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 73073 total (0 active), Execution time: mean = 525.684 us, total = 38.413 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 34767 total (1 active), Execution time: mean = 3.181 us, total = 110.600 ms, Queueing time: mean = 99.720 us, max = 51.449 ms, min = 3.386 us, total = 3.467 s [state-dump] RaySyncer.OnDemandBroadcasting - 34767 total (1 active), Execution time: mean = 11.688 us, total = 406.363 ms, Queueing time: mean = 92.262 us, max = 51.440 ms, min = 7.347 us, total = 3.208 s [state-dump] ObjectManager.UpdateAvailableMemory - 34766 total (0 active), Execution time: mean = 5.992 us, total = 208.301 ms, Queueing time: mean = 104.334 us, max = 1.031 ms, min = 2.098 us, total = 3.627 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 17393 total (1 active), Execution time: mean = 19.292 us, total = 335.551 ms, Queueing time: mean = 76.667 us, max = 13.722 ms, min = 5.381 us, total = 1.333 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 13892 total (1 active), Execution time: mean = 456.130 us, total = 6.337 s, Queueing time: mean = 75.079 us, max = 1.472 ms, min = -0.001 s, total = 1.043 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 3480 total (1 active), Execution time: mean = 9.558 us, total = 33.263 ms, Queueing time: mean = 181.604 us, max = 2.380 ms, min = 160.000 ns, total = 631.983 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3480 total (1 active), Execution time: mean = 15.406 us, total = 53.614 ms, Queueing time: mean = 67.229 us, max = 2.582 ms, min = 6.718 us, total = 233.958 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3480 total (1 active), Execution time: mean = 2.972 us, total = 10.343 ms, Queueing time: mean = 185.866 us, max = 2.379 ms, min = 4.508 us, total = 646.814 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3479 total (0 active), Execution time: mean = 102.231 us, total = 355.662 ms, Queueing time: mean = 112.010 us, max = 1.188 ms, min = 4.918 us, total = 389.683 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3479 total (0 active), Execution time: mean = 625.606 us, total = 2.176 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1161 total (1 active), Execution time: mean = 9.239 us, total = 10.726 ms, Queueing time: mean = 72.859 us, max = 363.446 us, min = 7.807 us, total = 84.589 ms [state-dump] NodeManager.GcsCheckAlive - 696 total (1 active), Execution time: mean = 324.290 us, total = 225.706 ms, Queueing time: mean = 618.349 us, max = 2.320 ms, min = 6.025 us, total = 430.371 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 696 total (0 active), Execution time: mean = 55.078 us, total = 38.334 ms, Queueing time: mean = 105.327 us, max = 307.469 us, min = 11.913 us, total = 73.307 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 696 total (0 active), Execution time: mean = 1.561 ms, total = 1.086 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 696 total (1 active), Execution time: mean = 553.695 us, total = 385.372 ms, Queueing time: mean = 389.361 us, max = 1.903 ms, min = 8.454 us, total = 270.995 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 348 total (1 active), Execution time: mean = 1.814 ms, total = 631.196 ms, Queueing time: mean = 72.538 us, max = 183.426 us, min = 11.269 us, total = 25.243 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 241 total (21 active), Execution time: mean = 8.391 us, total = 2.022 ms, Queueing time: mean = 214.220 s, max = 1921.160 s, min = 23.644 us, total = 51627.016 s [state-dump] ClientConnection.async_read.ProcessMessage - 220 total (0 active), Execution time: mean = 355.794 us, total = 78.275 ms, Queueing time: mean = 20.354 us, max = 494.085 us, min = 2.397 us, total = 4.478 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 83 total (0 active), Execution time: mean = 48.706 ms, total = 4.043 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 83 total (0 active), Execution time: mean = 100.509 us, total = 8.342 ms, Queueing time: mean = 180.288 us, max = 674.029 us, min = 6.921 us, total = 14.964 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 73 total (0 active), Execution time: mean = 105.669 us, total = 7.714 ms, Queueing time: mean = 101.687 us, max = 252.805 us, min = 19.400 us, total = 7.423 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 73 total (0 active), Execution time: mean = 589.105 us, total = 43.005 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 73 total (0 active), Execution time: mean = 36.946 us, total = 2.697 ms, Queueing time: mean = 164.270 us, max = 539.776 us, min = 15.433 us, total = 11.992 ms [state-dump] - 65 total (0 active), Execution time: mean = 913.708 ns, total = 59.391 us, Queueing time: mean = 98.104 us, max = 237.802 us, min = 20.527 us, total = 6.377 ms [state-dump] RaySyncer.BroadcastMessage - 65 total (0 active), Execution time: mean = 214.865 us, total = 13.966 ms, Queueing time: mean = 691.308 ns, max = 1.206 us, min = 91.000 ns, total = 44.935 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 58 total (1 active, 1 running), Execution time: mean = 2.821 ms, total = 163.596 ms, Queueing time: mean = 68.133 us, max = 154.519 us, min = 13.745 us, total = 3.952 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.377 s, total = 2998.642 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 410.773 us, total = 2.465 ms, Queueing time: mean = 111.594 us, max = 238.879 us, min = 29.074 us, total = 669.561 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 8.338 us, total = 33.352 us, Queueing time: mean = 70.909 us, max = 126.591 us, min = 69.398 us, total = 283.634 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:44:50,450 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:44:50,633 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{accelerator_type:A40: 10000, node:192.168.0.2: 10000, GPU: 20000, memory: 846480855040000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, CPU: 200000, node:__internal_head__: 10000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 309860 total (35 active) [state-dump] Queueing time: mean = 166.691 ms, max = 1921.160 s, min = -0.001 s, total = 51650.863 s [state-dump] Execution time: mean = 9.867 ms, total = 3057.476 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 74332 total (0 active), Execution time: mean = 36.800 us, total = 2.735 s, Queueing time: mean = 107.752 us, max = 3.225 ms, min = 1.438 us, total = 8.009 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 74332 total (0 active), Execution time: mean = 525.149 us, total = 39.035 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 35366 total (1 active), Execution time: mean = 3.180 us, total = 112.462 ms, Queueing time: mean = 99.542 us, max = 51.449 ms, min = 3.386 us, total = 3.520 s [state-dump] RaySyncer.OnDemandBroadcasting - 35366 total (1 active), Execution time: mean = 11.670 us, total = 412.722 ms, Queueing time: mean = 92.100 us, max = 51.440 ms, min = 7.347 us, total = 3.257 s [state-dump] ObjectManager.UpdateAvailableMemory - 35365 total (0 active), Execution time: mean = 5.984 us, total = 211.619 ms, Queueing time: mean = 104.121 us, max = 1.031 ms, min = 2.098 us, total = 3.682 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 17693 total (1 active), Execution time: mean = 19.269 us, total = 340.924 ms, Queueing time: mean = 76.518 us, max = 13.722 ms, min = 5.381 us, total = 1.354 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 14131 total (1 active), Execution time: mean = 455.997 us, total = 6.444 s, Queueing time: mean = 75.019 us, max = 1.472 ms, min = -0.001 s, total = 1.060 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 3540 total (1 active), Execution time: mean = 9.550 us, total = 33.806 ms, Queueing time: mean = 181.678 us, max = 2.380 ms, min = 160.000 ns, total = 643.141 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3540 total (1 active), Execution time: mean = 15.371 us, total = 54.412 ms, Queueing time: mean = 67.171 us, max = 2.582 ms, min = 6.718 us, total = 237.784 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3540 total (1 active), Execution time: mean = 2.971 us, total = 10.516 ms, Queueing time: mean = 185.939 us, max = 2.379 ms, min = 4.508 us, total = 658.223 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3539 total (0 active), Execution time: mean = 102.107 us, total = 361.358 ms, Queueing time: mean = 111.922 us, max = 1.188 ms, min = 4.918 us, total = 396.092 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3539 total (0 active), Execution time: mean = 624.558 us, total = 2.210 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1181 total (1 active), Execution time: mean = 9.230 us, total = 10.901 ms, Queueing time: mean = 72.851 us, max = 363.446 us, min = 7.807 us, total = 86.037 ms [state-dump] NodeManager.GcsCheckAlive - 708 total (1 active), Execution time: mean = 324.251 us, total = 229.570 ms, Queueing time: mean = 618.651 us, max = 2.320 ms, min = 6.025 us, total = 438.005 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 708 total (0 active), Execution time: mean = 55.050 us, total = 38.975 ms, Queueing time: mean = 105.206 us, max = 307.469 us, min = 11.913 us, total = 74.486 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 708 total (0 active), Execution time: mean = 1.559 ms, total = 1.104 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 708 total (1 active), Execution time: mean = 553.665 us, total = 391.995 ms, Queueing time: mean = 389.616 us, max = 1.903 ms, min = 8.454 us, total = 275.848 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 354 total (1 active), Execution time: mean = 1.814 ms, total = 642.128 ms, Queueing time: mean = 72.296 us, max = 183.426 us, min = 11.269 us, total = 25.593 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 241 total (21 active), Execution time: mean = 8.391 us, total = 2.022 ms, Queueing time: mean = 214.220 s, max = 1921.160 s, min = 23.644 us, total = 51627.016 s [state-dump] ClientConnection.async_read.ProcessMessage - 220 total (0 active), Execution time: mean = 355.794 us, total = 78.275 ms, Queueing time: mean = 20.354 us, max = 494.085 us, min = 2.397 us, total = 4.478 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 83 total (0 active), Execution time: mean = 48.706 ms, total = 4.043 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 83 total (0 active), Execution time: mean = 100.509 us, total = 8.342 ms, Queueing time: mean = 180.288 us, max = 674.029 us, min = 6.921 us, total = 14.964 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 73 total (0 active), Execution time: mean = 105.669 us, total = 7.714 ms, Queueing time: mean = 101.687 us, max = 252.805 us, min = 19.400 us, total = 7.423 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 73 total (0 active), Execution time: mean = 589.105 us, total = 43.005 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 73 total (0 active), Execution time: mean = 36.946 us, total = 2.697 ms, Queueing time: mean = 164.270 us, max = 539.776 us, min = 15.433 us, total = 11.992 ms [state-dump] - 65 total (0 active), Execution time: mean = 913.708 ns, total = 59.391 us, Queueing time: mean = 98.104 us, max = 237.802 us, min = 20.527 us, total = 6.377 ms [state-dump] RaySyncer.BroadcastMessage - 65 total (0 active), Execution time: mean = 214.865 us, total = 13.966 ms, Queueing time: mean = 691.308 ns, max = 1.206 us, min = 91.000 ns, total = 44.935 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 59 total (1 active, 1 running), Execution time: mean = 2.825 ms, total = 166.696 ms, Queueing time: mean = 67.266 us, max = 154.519 us, min = 13.745 us, total = 3.969 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.377 s, total = 2998.642 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 410.773 us, total = 2.465 ms, Queueing time: mean = 111.594 us, max = 238.879 us, min = 29.074 us, total = 669.561 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 8.338 us, total = 33.352 us, Queueing time: mean = 70.909 us, max = 126.591 us, min = 69.398 us, total = 283.634 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:45:50,450 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:45:50,636 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{accelerator_type:A40: 10000, node:192.168.0.2: 10000, GPU: 20000, memory: 846480855040000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, CPU: 200000, node:__internal_head__: 10000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 315098 total (36 active) [state-dump] Queueing time: mean = 163.921 ms, max = 1921.160 s, min = -0.001 s, total = 51651.224 s [state-dump] Execution time: mean = 11.610 ms, total = 3658.319 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 75592 total (0 active), Execution time: mean = 36.719 us, total = 2.776 s, Queueing time: mean = 107.444 us, max = 3.225 ms, min = 1.438 us, total = 8.122 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 75592 total (1 active), Execution time: mean = 524.234 us, total = 39.628 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 35966 total (1 active), Execution time: mean = 3.178 us, total = 114.288 ms, Queueing time: mean = 99.369 us, max = 51.449 ms, min = 3.386 us, total = 3.574 s [state-dump] RaySyncer.OnDemandBroadcasting - 35966 total (1 active), Execution time: mean = 11.643 us, total = 418.755 ms, Queueing time: mean = 91.951 us, max = 51.440 ms, min = 7.347 us, total = 3.307 s [state-dump] ObjectManager.UpdateAvailableMemory - 35965 total (0 active), Execution time: mean = 5.971 us, total = 214.732 ms, Queueing time: mean = 103.889 us, max = 1.031 ms, min = 2.098 us, total = 3.736 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 17993 total (1 active), Execution time: mean = 19.246 us, total = 346.293 ms, Queueing time: mean = 76.349 us, max = 13.722 ms, min = 4.133 us, total = 1.374 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 14371 total (1 active), Execution time: mean = 455.802 us, total = 6.550 s, Queueing time: mean = 74.894 us, max = 1.472 ms, min = -0.001 s, total = 1.076 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 3600 total (1 active), Execution time: mean = 9.538 us, total = 34.336 ms, Queueing time: mean = 182.114 us, max = 2.380 ms, min = 160.000 ns, total = 655.612 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3600 total (1 active), Execution time: mean = 15.334 us, total = 55.201 ms, Queueing time: mean = 67.053 us, max = 2.582 ms, min = 6.718 us, total = 241.391 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3600 total (1 active), Execution time: mean = 2.969 us, total = 10.689 ms, Queueing time: mean = 186.365 us, max = 2.379 ms, min = 4.508 us, total = 670.913 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3599 total (0 active), Execution time: mean = 102.039 us, total = 367.238 ms, Queueing time: mean = 111.657 us, max = 1.188 ms, min = 4.918 us, total = 401.854 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3599 total (0 active), Execution time: mean = 623.835 us, total = 2.245 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1201 total (1 active), Execution time: mean = 9.242 us, total = 11.100 ms, Queueing time: mean = 72.895 us, max = 363.446 us, min = 7.807 us, total = 87.547 ms [state-dump] NodeManager.GcsCheckAlive - 720 total (1 active), Execution time: mean = 323.507 us, total = 232.925 ms, Queueing time: mean = 621.511 us, max = 2.445 ms, min = 6.025 us, total = 447.488 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 720 total (0 active), Execution time: mean = 54.913 us, total = 39.538 ms, Queueing time: mean = 105.055 us, max = 307.469 us, min = 11.913 us, total = 75.640 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 720 total (0 active), Execution time: mean = 1.557 ms, total = 1.121 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 720 total (1 active), Execution time: mean = 552.862 us, total = 398.061 ms, Queueing time: mean = 392.670 us, max = 2.257 ms, min = 8.454 us, total = 282.722 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 360 total (1 active), Execution time: mean = 1.819 ms, total = 654.924 ms, Queueing time: mean = 72.429 us, max = 183.426 us, min = 11.269 us, total = 26.074 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 241 total (21 active), Execution time: mean = 8.391 us, total = 2.022 ms, Queueing time: mean = 214.220 s, max = 1921.160 s, min = 23.644 us, total = 51627.016 s [state-dump] ClientConnection.async_read.ProcessMessage - 220 total (0 active), Execution time: mean = 355.794 us, total = 78.275 ms, Queueing time: mean = 20.354 us, max = 494.085 us, min = 2.397 us, total = 4.478 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 83 total (0 active), Execution time: mean = 48.706 ms, total = 4.043 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 83 total (0 active), Execution time: mean = 100.509 us, total = 8.342 ms, Queueing time: mean = 180.288 us, max = 674.029 us, min = 6.921 us, total = 14.964 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 73 total (0 active), Execution time: mean = 105.669 us, total = 7.714 ms, Queueing time: mean = 101.687 us, max = 252.805 us, min = 19.400 us, total = 7.423 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 73 total (0 active), Execution time: mean = 589.105 us, total = 43.005 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 73 total (0 active), Execution time: mean = 36.946 us, total = 2.697 ms, Queueing time: mean = 164.270 us, max = 539.776 us, min = 15.433 us, total = 11.992 ms [state-dump] - 65 total (0 active), Execution time: mean = 913.708 ns, total = 59.391 us, Queueing time: mean = 98.104 us, max = 237.802 us, min = 20.527 us, total = 6.377 ms [state-dump] RaySyncer.BroadcastMessage - 65 total (0 active), Execution time: mean = 214.865 us, total = 13.966 ms, Queueing time: mean = 691.308 ns, max = 1.206 us, min = 91.000 ns, total = 44.935 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 60 total (1 active, 1 running), Execution time: mean = 2.827 ms, total = 169.620 ms, Queueing time: mean = 67.193 us, max = 154.519 us, min = 13.745 us, total = 4.032 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 8 total (1 active), Execution time: mean = 449.831 s, total = 3598.645 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 7 total (0 active), Execution time: mean = 422.222 us, total = 2.956 ms, Queueing time: mean = 100.169 us, max = 238.879 us, min = 29.074 us, total = 701.180 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 8.422 us, total = 42.108 us, Queueing time: mean = 68.676 us, max = 126.591 us, min = 59.744 us, total = 343.378 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:46:50,450 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:46:50,638 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{accelerator_type:A40: 10000, node:192.168.0.2: 10000, GPU: 20000, memory: 846480855040000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, CPU: 200000, node:__internal_head__: 10000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 320327 total (35 active) [state-dump] Queueing time: mean = 161.247 ms, max = 1921.160 s, min = -0.001 s, total = 51651.654 s [state-dump] Execution time: mean = 11.424 ms, total = 3659.272 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 76851 total (0 active), Execution time: mean = 36.730 us, total = 2.823 s, Queueing time: mean = 107.553 us, max = 3.225 ms, min = 1.438 us, total = 8.266 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 76851 total (0 active), Execution time: mean = 524.551 us, total = 40.312 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 36565 total (1 active), Execution time: mean = 3.180 us, total = 116.279 ms, Queueing time: mean = 99.472 us, max = 51.449 ms, min = 3.386 us, total = 3.637 s [state-dump] RaySyncer.OnDemandBroadcasting - 36565 total (1 active), Execution time: mean = 11.658 us, total = 426.276 ms, Queueing time: mean = 92.044 us, max = 51.440 ms, min = 7.347 us, total = 3.366 s [state-dump] ObjectManager.UpdateAvailableMemory - 36564 total (0 active), Execution time: mean = 5.982 us, total = 218.736 ms, Queueing time: mean = 104.132 us, max = 1.031 ms, min = 2.098 us, total = 3.807 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 18293 total (1 active), Execution time: mean = 19.294 us, total = 352.947 ms, Queueing time: mean = 76.433 us, max = 13.722 ms, min = 4.133 us, total = 1.398 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 14610 total (1 active), Execution time: mean = 455.979 us, total = 6.662 s, Queueing time: mean = 74.972 us, max = 1.472 ms, min = -0.001 s, total = 1.095 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 3660 total (1 active), Execution time: mean = 9.534 us, total = 34.896 ms, Queueing time: mean = 182.240 us, max = 2.380 ms, min = 160.000 ns, total = 666.997 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3660 total (1 active), Execution time: mean = 15.320 us, total = 56.072 ms, Queueing time: mean = 67.115 us, max = 2.582 ms, min = 6.718 us, total = 245.639 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3660 total (1 active), Execution time: mean = 2.969 us, total = 10.866 ms, Queueing time: mean = 186.492 us, max = 2.379 ms, min = 4.508 us, total = 682.560 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3659 total (0 active), Execution time: mean = 102.055 us, total = 373.419 ms, Queueing time: mean = 111.626 us, max = 1.188 ms, min = 4.918 us, total = 408.441 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3659 total (0 active), Execution time: mean = 623.754 us, total = 2.282 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1221 total (1 active), Execution time: mean = 9.255 us, total = 11.300 ms, Queueing time: mean = 72.886 us, max = 363.446 us, min = 7.807 us, total = 88.994 ms [state-dump] NodeManager.GcsCheckAlive - 732 total (1 active), Execution time: mean = 323.765 us, total = 236.996 ms, Queueing time: mean = 621.966 us, max = 2.445 ms, min = 6.025 us, total = 455.279 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 732 total (0 active), Execution time: mean = 54.995 us, total = 40.256 ms, Queueing time: mean = 105.156 us, max = 307.469 us, min = 11.913 us, total = 76.974 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 732 total (0 active), Execution time: mean = 1.557 ms, total = 1.140 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 732 total (1 active), Execution time: mean = 552.904 us, total = 404.726 ms, Queueing time: mean = 393.299 us, max = 2.257 ms, min = 8.454 us, total = 287.895 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 366 total (1 active), Execution time: mean = 1.820 ms, total = 666.073 ms, Queueing time: mean = 72.502 us, max = 183.426 us, min = 11.269 us, total = 26.536 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 241 total (21 active), Execution time: mean = 8.391 us, total = 2.022 ms, Queueing time: mean = 214.220 s, max = 1921.160 s, min = 23.644 us, total = 51627.016 s [state-dump] ClientConnection.async_read.ProcessMessage - 220 total (0 active), Execution time: mean = 355.794 us, total = 78.275 ms, Queueing time: mean = 20.354 us, max = 494.085 us, min = 2.397 us, total = 4.478 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 83 total (0 active), Execution time: mean = 48.706 ms, total = 4.043 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 83 total (0 active), Execution time: mean = 100.509 us, total = 8.342 ms, Queueing time: mean = 180.288 us, max = 674.029 us, min = 6.921 us, total = 14.964 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 73 total (0 active), Execution time: mean = 105.669 us, total = 7.714 ms, Queueing time: mean = 101.687 us, max = 252.805 us, min = 19.400 us, total = 7.423 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 73 total (0 active), Execution time: mean = 589.105 us, total = 43.005 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 73 total (0 active), Execution time: mean = 36.946 us, total = 2.697 ms, Queueing time: mean = 164.270 us, max = 539.776 us, min = 15.433 us, total = 11.992 ms [state-dump] - 65 total (0 active), Execution time: mean = 913.708 ns, total = 59.391 us, Queueing time: mean = 98.104 us, max = 237.802 us, min = 20.527 us, total = 6.377 ms [state-dump] RaySyncer.BroadcastMessage - 65 total (0 active), Execution time: mean = 214.865 us, total = 13.966 ms, Queueing time: mean = 691.308 ns, max = 1.206 us, min = 91.000 ns, total = 44.935 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 61 total (1 active, 1 running), Execution time: mean = 2.826 ms, total = 172.416 ms, Queueing time: mean = 67.959 us, max = 154.519 us, min = 13.745 us, total = 4.145 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 8 total (1 active), Execution time: mean = 449.831 s, total = 3598.645 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 7 total (0 active), Execution time: mean = 422.222 us, total = 2.956 ms, Queueing time: mean = 100.169 us, max = 238.879 us, min = 29.074 us, total = 701.180 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 8.422 us, total = 42.108 us, Queueing time: mean = 68.676 us, max = 126.591 us, min = 59.744 us, total = 343.378 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:47:50,451 I 6312 6341] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:47:50,642 I 6312 6312] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{accelerator_type:A40: 10000, node:192.168.0.2: 10000, GPU: 20000, memory: 846480855040000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, CPU: 200000, node:__internal_head__: 10000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 325557 total (35 active) [state-dump] Queueing time: mean = 158.658 ms, max = 1921.160 s, min = -0.001 s, total = 51652.085 s [state-dump] Execution time: mean = 11.243 ms, total = 3660.249 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 78111 total (0 active), Execution time: mean = 36.749 us, total = 2.870 s, Queueing time: mean = 107.664 us, max = 3.225 ms, min = 1.438 us, total = 8.410 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 78111 total (0 active), Execution time: mean = 525.083 us, total = 41.015 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.CheckGC - 37164 total (1 active), Execution time: mean = 3.183 us, total = 118.305 ms, Queueing time: mean = 99.591 us, max = 51.449 ms, min = 3.386 us, total = 3.701 s [state-dump] RaySyncer.OnDemandBroadcasting - 37164 total (1 active), Execution time: mean = 11.667 us, total = 433.602 ms, Queueing time: mean = 92.156 us, max = 51.440 ms, min = 7.347 us, total = 3.425 s [state-dump] ObjectManager.UpdateAvailableMemory - 37163 total (0 active), Execution time: mean = 5.994 us, total = 222.741 ms, Queueing time: mean = 104.292 us, max = 1.031 ms, min = 2.098 us, total = 3.876 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 18593 total (1 active), Execution time: mean = 19.299 us, total = 358.829 ms, Queueing time: mean = 76.527 us, max = 13.722 ms, min = 4.133 us, total = 1.423 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 14850 total (1 active), Execution time: mean = 456.294 us, total = 6.776 s, Queueing time: mean = 75.025 us, max = 1.472 ms, min = -0.001 s, total = 1.114 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 3720 total (1 active), Execution time: mean = 9.522 us, total = 35.421 ms, Queueing time: mean = 182.483 us, max = 2.380 ms, min = 160.000 ns, total = 678.837 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3720 total (1 active), Execution time: mean = 15.305 us, total = 56.935 ms, Queueing time: mean = 67.116 us, max = 2.582 ms, min = 6.718 us, total = 249.671 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3720 total (1 active), Execution time: mean = 2.969 us, total = 11.044 ms, Queueing time: mean = 186.726 us, max = 2.379 ms, min = 4.508 us, total = 694.622 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3718 total (0 active), Execution time: mean = 102.138 us, total = 379.749 ms, Queueing time: mean = 111.815 us, max = 1.188 ms, min = 4.918 us, total = 415.729 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3718 total (0 active), Execution time: mean = 624.442 us, total = 2.322 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1241 total (1 active), Execution time: mean = 9.249 us, total = 11.478 ms, Queueing time: mean = 72.818 us, max = 363.446 us, min = 7.807 us, total = 90.368 ms [state-dump] NodeManager.GcsCheckAlive - 744 total (1 active), Execution time: mean = 324.222 us, total = 241.221 ms, Queueing time: mean = 622.684 us, max = 2.445 ms, min = 6.025 us, total = 463.277 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 744 total (0 active), Execution time: mean = 55.105 us, total = 40.998 ms, Queueing time: mean = 105.319 us, max = 307.469 us, min = 11.913 us, total = 78.357 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 744 total (0 active), Execution time: mean = 1.559 ms, total = 1.160 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 744 total (1 active), Execution time: mean = 553.362 us, total = 411.701 ms, Queueing time: mean = 394.064 us, max = 2.257 ms, min = 8.454 us, total = 293.183 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 372 total (1 active), Execution time: mean = 1.822 ms, total = 677.846 ms, Queueing time: mean = 72.716 us, max = 183.426 us, min = 11.269 us, total = 27.050 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 241 total (21 active), Execution time: mean = 8.391 us, total = 2.022 ms, Queueing time: mean = 214.220 s, max = 1921.160 s, min = 23.644 us, total = 51627.016 s [state-dump] ClientConnection.async_read.ProcessMessage - 220 total (0 active), Execution time: mean = 355.794 us, total = 78.275 ms, Queueing time: mean = 20.354 us, max = 494.085 us, min = 2.397 us, total = 4.478 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 83 total (0 active), Execution time: mean = 48.706 ms, total = 4.043 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 83 total (0 active), Execution time: mean = 100.509 us, total = 8.342 ms, Queueing time: mean = 180.288 us, max = 674.029 us, min = 6.921 us, total = 14.964 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 73 total (0 active), Execution time: mean = 105.669 us, total = 7.714 ms, Queueing time: mean = 101.687 us, max = 252.805 us, min = 19.400 us, total = 7.423 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 73 total (0 active), Execution time: mean = 589.105 us, total = 43.005 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 73 total (0 active), Execution time: mean = 36.946 us, total = 2.697 ms, Queueing time: mean = 164.270 us, max = 539.776 us, min = 15.433 us, total = 11.992 ms [state-dump] - 65 total (0 active), Execution time: mean = 913.708 ns, total = 59.391 us, Queueing time: mean = 98.104 us, max = 237.802 us, min = 20.527 us, total = 6.377 ms [state-dump] RaySyncer.BroadcastMessage - 65 total (0 active), Execution time: mean = 214.865 us, total = 13.966 ms, Queueing time: mean = 691.308 ns, max = 1.206 us, min = 91.000 ns, total = 44.935 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 62 total (1 active, 1 running), Execution time: mean = 2.819 ms, total = 174.775 ms, Queueing time: mean = 69.590 us, max = 169.082 us, min = 13.745 us, total = 4.315 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms [state-dump] NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 8 total (1 active), Execution time: mean = 449.831 s, total = 3598.645 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 7 total (0 active), Execution time: mean = 422.222 us, total = 2.956 ms, Queueing time: mean = 100.169 us, max = 238.879 us, min = 29.074 us, total = 701.180 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 8.422 us, total = 42.108 us, Queueing time: mean = 68.676 us, max = 126.591 us, min = 59.744 us, total = 343.378 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 22:48:19,664 I 6312 6312] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=3, has creation task exception = false [2025-01-20 22:48:19,664 I 6312 6312] (raylet) node_manager.cc:1586: Driver (pid=3836) is disconnected. worker_id=01000000ffffffffffffffffffffffffffffffffffffffffffffffff job_id=01000000 [2025-01-20 22:48:19,670 I 6312 6312] (raylet) worker_pool.cc:692: Job 01000000 already started in worker pool. [2025-01-20 22:48:19,775 I 6312 6312] (raylet) main.cc:454: received SIGTERM. Existing local drain request = None [2025-01-20 22:48:19,775 I 6312 6312] (raylet) main.cc:255: Raylet graceful shutdown triggered, reason = EXPECTED_TERMINATION, reason message = received SIGTERM [2025-01-20 22:48:19,775 I 6312 6312] (raylet) main.cc:258: Shutting down... [2025-01-20 22:48:19,775 I 6312 6312] (raylet) accessor.cc:510: Unregistering node node_id=ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [2025-01-20 22:48:19,778 I 6312 6312] (raylet) accessor.cc:523: Finished unregistering node info, status = OK node_id=ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd [2025-01-20 22:48:19,787 I 6312 6312] (raylet) agent_manager.cc:112: Killing agent dashboard_agent/424238335, pid 6377. [2025-01-20 22:48:19,799 I 6312 6380] (raylet) agent_manager.cc:79: Agent process with name dashboard_agent/424238335 exited, exit code 0. [2025-01-20 22:48:19,799 I 6312 6312] (raylet) agent_manager.cc:112: Killing agent runtime_env_agent, pid 6381. [2025-01-20 22:48:19,810 I 6312 6382] (raylet) agent_manager.cc:79: Agent process with name runtime_env_agent exited, exit code 0. [2025-01-20 22:48:19,811 I 6312 6312] (raylet) io_service_pool.cc:47: IOServicePool is stopped. [2025-01-20 22:48:19,866 I 6312 6312] (raylet) stats.h:120: Stats module has shutdown.